id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2157624683,I_kwDOAMm_X86Amr1r,8788,CI Failure in Xarray test suite post-Dask tokenization update,13301940,closed,0,6213168,,1,2024-02-27T21:23:48Z,2024-03-01T03:29:52Z,2024-03-01T03:29:52Z,MEMBER,,,,"### What is your issue? Recent changes in Dask's tokenization process (https://github.com/dask/dask/pull/10876) seem to have introduced unexpected behavior in Xarray's test suite. This has led to CI failures, specifically in tests related to tokenization. - https://github.com/pydata/xarray/actions/runs/8069874717/job/22045898877 ```python ---------- coverage: platform linux, python 3.12.2-final-0 ----------- Coverage XML written to file coverage.xml =========================== short test summary info ============================ FAILED xarray/tests/test_dask.py::test_token_identical[obj0-1] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314' - 6352792990cfe23adb7e8004a9055314 + bbd9679bdaf284c371cd3db65e29a72d FAILED xarray/tests/test_dask.py::test_token_identical[obj0-2] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314' - 6352792990cfe23adb7e8004a9055314 + bbd9679bdaf284c371cd3db65e29a72d FAILED xarray/tests/test_dask.py::test_token_identical[obj1-1] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02' - 9e2ab1c44990adb4fb76ac737226fa02 + c520b8516da8b6a98c10e9e0d02b79d0 FAILED xarray/tests/test_dask.py::test_token_identical[obj1-2] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02' - 9e2ab1c44990adb4fb76ac737226fa02 + c520b8516da8b6a98c10e9e0d02b79d0 = 4 failed, 16293 passed, [628](https://github.com/pydata/xarray/actions/runs/8069874717/job/22045898877#step:9:629) skipped, 90 xfailed, 71 xpassed, 213 warnings in 472.07s (0:07:52) = Error: Process completed with exit code 1. ``` previously, the following code snippet would pass, verifying the consistency of tokenization in Xarray objects: ```python In [1]: import xarray as xr, numpy as np In [2]: def make_da(): ...: da = xr.DataArray( ...: np.ones((10, 20)), ...: dims=[""x"", ""y""], ...: coords={""x"": np.arange(10), ""y"": np.arange(100, 120)}, ...: name=""a"", ...: ).chunk({""x"": 4, ""y"": 5}) ...: da.x.attrs[""long_name""] = ""x"" ...: da.attrs[""test""] = ""test"" ...: da.coords[""c2""] = 0.5 ...: da.coords[""ndcoord""] = da.x * 2 ...: da.coords[""cxy""] = (da.x * da.y).chunk({""x"": 4, ""y"": 5}) ...: ...: return da ...: In [3]: da = make_da() In [4]: import dask.base In [5]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False)) In [6]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=True)) In [9]: dask.__version__ Out[9]: '2023.3.0' ``` However, post-update in Dask version '2024.2.1', the same code fails: ```python In [55]: ...: def make_da(): ...: da = xr.DataArray( ...: np.ones((10, 20)), ...: dims=[""x"", ""y""], ...: coords={""x"": np.arange(10), ""y"": np.arange(100, 120)}, ...: name=""a"", ...: ).chunk({""x"": 4, ""y"": 5}) ...: da.x.attrs[""long_name""] = ""x"" ...: da.attrs[""test""] = ""test"" ...: da.coords[""c2""] = 0.5 ...: da.coords[""ndcoord""] = da.x * 2 ...: da.coords[""cxy""] = (da.x * da.y).chunk({""x"": 4, ""y"": 5}) ...: ...: return da ...: In [56]: da = make_da() ``` ```python In [57]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False)) --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[57], line 1 ----> 1 assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False)) AssertionError: In [58]: dask.base.tokenize(da) Out[58]: 'bbd9679bdaf284c371cd3db65e29a72d' In [59]: dask.base.tokenize(da.copy(deep=False)) Out[59]: '6352792990cfe23adb7e8004a9055314' In [61]: dask.__version__ Out[61]: '2024.2.1' ``` additionally, a deeper dive into `dask.base.normalize_token()` across the two Dask versions revealed that the latest version includes additional state or metadata in tokenization that was not present in earlier versions. - old version ```python In [29]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name)) Out[29]: ('tuple', [xarray.core.dataarray.DataArray, ('tuple', [xarray.core.variable.Variable, ('tuple', ['x', 'y']), 'xarray--14cc91345e4b75c769b9032d473f6f6e', ('list', [('tuple', ['test', 'test'])])]), ('list', [('tuple', ['c2', ('tuple', [xarray.core.variable.Variable, ('tuple', []), (0.5, dtype('float64')), ('list', [])])]), ('tuple', ['cxy', ('tuple', [xarray.core.variable.Variable, ('tuple', ['x', 'y']), 'xarray--8e98950eca22c69d304f0a48bc6c2df9', ('list', [])])]), ('tuple', ['ndcoord', ('tuple', [xarray.core.variable.Variable, ('tuple', ['x']), 'xarray-ndcoord-82411ea5e080aa9b9f554554befc2f39', ('list', [])])]), ('tuple', ['x', ('tuple', [xarray.core.variable.IndexVariable, ('tuple', ['x']), ['x', ('603944b9792513fa0c686bb494a66d96c667f879', dtype('int64'), (10,), (8,))], ('list', [('tuple', ['long_name', 'x'])])])]), ('tuple', ['y', ('tuple', [xarray.core.variable.IndexVariable, ('tuple', ['y']), ['y', ('fc411db876ae0f4734dac8b64152d5c6526a537a', dtype('int64'), (20,), (8,))], ('list', [])])])]), 'a']) ``` - most recent version ```python In [44]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name)) Out[44]: ('tuple', [('7b61e7593a274e48', []), ('tuple', [('215b115b265c420c', []), ('tuple', ['x', 'y']), 'xarray--980383b18aab94069bdb02e9e0956184', ('dict', [('tuple', ['test', 'test'])])]), ('dict', [('tuple', ['c2', ('tuple', [('__seen', 2), ('tuple', []), ('6825817183edbca7', ['48cb5e118059da42']), ('dict', [])])]), ('tuple', ['cxy', ('tuple', [('__seen', 2), ('tuple', ['x', 'y']), 'xarray--6babb4e95665a53f34a3e337129d54b5', ('dict', [])])]), ('tuple', ['ndcoord', ('tuple', [('__seen', 2), ('tuple', ['x']), 'xarray-ndcoord-8636fac37e5e6f4401eab2aef399f402', ('dict', [])])]), ('tuple', ['x', ('tuple', [('abc1995cae8530ae', []), ('tuple', ['x']), ['x', ('99b2df4006e7d28a', ['04673d65c892b5ba'])], ('dict', [('tuple', ['long_name', 'x'])])])]), ('tuple', ['y', ('tuple', [('__seen', 25), ('tuple', ['y']), ['y', ('88974ea603e15c49', ['a6c0f2053e85c87e'])], ('dict', [])])])]), 'a']) ``` Cc @dcherian / @crusaderky for visibility ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8788/reactions"", ""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,completed,13221727,issue 549712566,MDU6SXNzdWU1NDk3MTI1NjY=,3695,mypy --strict fails on scripts/packages depending on xarray; __all__ required,2418513,closed,0,6213168,,3,2020-01-14T17:27:44Z,2020-01-17T20:42:25Z,2020-01-17T20:42:25Z,NONE,,,,"Checked this with both 0.14.1 and master branch. Create `foo.py`: ```python from xarray import DataArray ``` and run: ```sh $ mypy --strict foo.py ``` which results in ``` foo.py:1: error: Module 'xarray' has no attribute 'DataArray' Found 1 error in 1 file (checked 1 source file) ``` I did a bit of digging trying to make it work, it looks like what makes the above script work with mypy is adding ```python __all__ = ('DataArray',) ``` to `xarray/__init__.py`, otherwise mypy treats those imports as ""private"" (and is correct in doing so). Should `__all__` be added to the root `__init__.py`? To any `__init__.py` in subpackages as well?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3695/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 495198361,MDU6SXNzdWU0OTUxOTgzNjE=,3317,Can't create weakrefs on DataArrays since xarray 0.13.0,167802,closed,0,6213168,,8,2019-09-18T12:36:46Z,2019-10-14T21:38:09Z,2019-09-18T15:53:51Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray as xr from weakref import ref arr = xr.DataArray([1, 2, 3]) ref(arr) ``` #### Expected Output I expect the weak reference to be created as in former versions #### Problem Description The above code raises the following exception: `TypeError: cannot create weak reference to 'DataArray' object` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.1.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.1 numpy: 1.17.0 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.22 cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.4.0 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3317/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue