home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2157624683

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2157624683 I_kwDOAMm_X86Amr1r 8788 CI Failure in Xarray test suite post-Dask tokenization update 13301940 closed 0 6213168   1 2024-02-27T21:23:48Z 2024-03-01T03:29:52Z 2024-03-01T03:29:52Z MEMBER      

What is your issue?

Recent changes in Dask's tokenization process (https://github.com/dask/dask/pull/10876) seem to have introduced unexpected behavior in Xarray's test suite. This has led to CI failures, specifically in tests related to tokenization.

  • https://github.com/pydata/xarray/actions/runs/8069874717/job/22045898877

```python ---------- coverage: platform linux, python 3.12.2-final-0 ----------- Coverage XML written to file coverage.xml

=========================== short test summary info ============================ FAILED xarray/tests/test_dask.py::test_token_identical[obj0-<lambda>1] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314'

  • 6352792990cfe23adb7e8004a9055314
  • bbd9679bdaf284c371cd3db65e29a72d FAILED xarray/tests/test_dask.py::test_token_identical[obj0-<lambda>2] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314'

  • 6352792990cfe23adb7e8004a9055314

  • bbd9679bdaf284c371cd3db65e29a72d FAILED xarray/tests/test_dask.py::test_token_identical[obj1-<lambda>1] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02'

  • 9e2ab1c44990adb4fb76ac737226fa02

  • c520b8516da8b6a98c10e9e0d02b79d0 FAILED xarray/tests/test_dask.py::test_token_identical[obj1-<lambda>2] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02'

  • 9e2ab1c44990adb4fb76ac737226fa02

  • c520b8516da8b6a98c10e9e0d02b79d0 = 4 failed, 16293 passed, 628 skipped, 90 xfailed, 71 xpassed, 213 warnings in 472.07s (0:07:52) = Error: Process completed with exit code 1. ```

previously, the following code snippet would pass, verifying the consistency of tokenization in Xarray objects:

```python In [1]: import xarray as xr, numpy as np

In [2]: def make_da(): ...: da = xr.DataArray( ...: np.ones((10, 20)), ...: dims=["x", "y"], ...: coords={"x": np.arange(10), "y": np.arange(100, 120)}, ...: name="a", ...: ).chunk({"x": 4, "y": 5}) ...: da.x.attrs["long_name"] = "x" ...: da.attrs["test"] = "test" ...: da.coords["c2"] = 0.5 ...: da.coords["ndcoord"] = da.x * 2 ...: da.coords["cxy"] = (da.x * da.y).chunk({"x": 4, "y": 5}) ...: ...: return da ...:

In [3]: da = make_da()

In [4]: import dask.base

In [5]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))

In [6]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=True))

In [9]: dask.version Out[9]: '2023.3.0' ```

However, post-update in Dask version '2024.2.1', the same code fails:

```python In [55]: ...: def make_da(): ...: da = xr.DataArray( ...: np.ones((10, 20)), ...: dims=["x", "y"], ...: coords={"x": np.arange(10), "y": np.arange(100, 120)}, ...: name="a", ...: ).chunk({"x": 4, "y": 5}) ...: da.x.attrs["long_name"] = "x" ...: da.attrs["test"] = "test" ...: da.coords["c2"] = 0.5 ...: da.coords["ndcoord"] = da.x * 2 ...: da.coords["cxy"] = (da.x * da.y).chunk({"x": 4, "y": 5}) ...: ...: return da ...:

In [56]: da = make_da() ```

```python In [57]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))


AssertionError Traceback (most recent call last) Cell In[57], line 1 ----> 1 assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))

AssertionError:

In [58]: dask.base.tokenize(da) Out[58]: 'bbd9679bdaf284c371cd3db65e29a72d'

In [59]: dask.base.tokenize(da.copy(deep=False)) Out[59]: '6352792990cfe23adb7e8004a9055314'

In [61]: dask.version Out[61]: '2024.2.1' ```

additionally, a deeper dive into dask.base.normalize_token() across the two Dask versions revealed that the latest version includes additional state or metadata in tokenization that was not present in earlier versions.

  • old version python In [29]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name)) Out[29]: ('tuple', [xarray.core.dataarray.DataArray, ('tuple', [xarray.core.variable.Variable, ('tuple', ['x', 'y']), 'xarray-<this-array>-14cc91345e4b75c769b9032d473f6f6e', ('list', [('tuple', ['test', 'test'])])]), ('list', [('tuple', ['c2', ('tuple', [xarray.core.variable.Variable, ('tuple', []), (0.5, dtype('float64')), ('list', [])])]), ('tuple', ['cxy', ('tuple', [xarray.core.variable.Variable, ('tuple', ['x', 'y']), 'xarray-<this-array>-8e98950eca22c69d304f0a48bc6c2df9', ('list', [])])]), ('tuple', ['ndcoord', ('tuple', [xarray.core.variable.Variable, ('tuple', ['x']), 'xarray-ndcoord-82411ea5e080aa9b9f554554befc2f39', ('list', [])])]), ('tuple', ['x', ('tuple', [xarray.core.variable.IndexVariable, ('tuple', ['x']), ['x', ('603944b9792513fa0c686bb494a66d96c667f879', dtype('int64'), (10,), (8,))], ('list', [('tuple', ['long_name', 'x'])])])]), ('tuple', ['y', ('tuple', [xarray.core.variable.IndexVariable, ('tuple', ['y']), ['y', ('fc411db876ae0f4734dac8b64152d5c6526a537a', dtype('int64'), (20,), (8,))], ('list', [])])])]), 'a'])

  • most recent version

python In [44]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name)) Out[44]: ('tuple', [('7b61e7593a274e48', []), ('tuple', [('215b115b265c420c', []), ('tuple', ['x', 'y']), 'xarray-<this-array>-980383b18aab94069bdb02e9e0956184', ('dict', [('tuple', ['test', 'test'])])]), ('dict', [('tuple', ['c2', ('tuple', [('__seen', 2), ('tuple', []), ('6825817183edbca7', ['48cb5e118059da42']), ('dict', [])])]), ('tuple', ['cxy', ('tuple', [('__seen', 2), ('tuple', ['x', 'y']), 'xarray-<this-array>-6babb4e95665a53f34a3e337129d54b5', ('dict', [])])]), ('tuple', ['ndcoord', ('tuple', [('__seen', 2), ('tuple', ['x']), 'xarray-ndcoord-8636fac37e5e6f4401eab2aef399f402', ('dict', [])])]), ('tuple', ['x', ('tuple', [('abc1995cae8530ae', []), ('tuple', ['x']), ['x', ('99b2df4006e7d28a', ['04673d65c892b5ba'])], ('dict', [('tuple', ['long_name', 'x'])])])]), ('tuple', ['y', ('tuple', [('__seen', 25), ('tuple', ['y']), ['y', ('88974ea603e15c49', ['a6c0f2053e85c87e'])], ('dict', [])])])]), 'a'])

Cc @dcherian / @crusaderky for visibility

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8788/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.57ms · About: xarray-datasette