id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 674142764,MDExOlB1bGxSZXF1ZXN0NDYzODg1MjM2,4318,Implicit dask import 4164,41870650,closed,0,,,2,2020-08-06T08:52:46Z,2020-08-06T16:12:58Z,2020-08-06T16:12:44Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4318,"This is a fix for Issue 4164. I haven't written any tests since this issue involves partial installation of dask, and while I know ways to fake the results of failed imports, I'm not sure these are appropriate for a test bed. The test script from the issue no longer raises an error. - [ ] Closes #4164 - [ ] Tests added - [ ] Passes `isort . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4318/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 641504450,MDU6SXNzdWU2NDE1MDQ0NTA=,4164,Implicit use of dask feature,41870650,closed,0,,,3,2020-06-18T19:41:47Z,2020-08-06T16:12:44Z,2020-08-06T16:12:44Z,CONTRIBUTOR,,,," **What happened**: I tried to use the `to_netcdf` function to store a dataset into a NetCDF file, but the following exception was raised ``` Traceback (most recent call last): File ""dask-error.py"", line 27, in ds.to_netcdf(""test.nc"") File ""/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/core/dataset.py"", line 1544, in to_netcdf return to_netcdf( File ""/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/api.py"", line 1051, in to_netcdf scheduler = _get_scheduler() File ""/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/locks.py"", line 79, in _get_scheduler actual_get = dask.base.get_scheduler(get, collection) AttributeError: module 'dask' has no attribute 'base' ``` This code sample works perfectly as expected when the dask package is not installed in the environment, and the method works as expected. However, we dask is installed the `_get_scheduler` function is called and produces the error (this can be found here) https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L79 After a little digging through, the problem is that the `base` module in the dask package depends on the toolz package, which is not a default dependency of dask and so causes a silent import failure when dask initialises its namespace (https://github.com/dask/dask/blob/416d348f7174a302815758cb87dbf6983226ddc5/dask/__init__.py#L10). As a result, the base package is not importable form the dask top level, and importing it separately gives as follows ``` from dask import base ``` raises a ModuleNotFoundError. ``` Traceback (most recent call last): File """", line 1, in File ""/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/dask/base.py"", line 13, in from tlz import merge, groupby, curry, identity ModuleNotFoundError: No module named 'tlz' ``` I recommend the following fix. At the following line in the `_get_scheduler` function https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L75 replace the import with the following ``` from dask.base import get_scheduler ``` and remove `dask.base` from the later call. I should, however, point out that `get_scheduler` does not appear to be part of the Dask public API. **What you expected to happen**: The `to_netcdf` method should have exited silently and created a new file in the working directory with the contents of the data set. **Minimal Complete Verifiable Example**: This code is basically the ""Toy weather data"" example from the documentation, except for the last line. ```python import numpy as np import pandas as pd import xarray as xr np.random.seed(123) xr.set_options(display_style=""html"") times = pd.date_range(""2000-01-01"", ""2001-12-31"", name=""time"") annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28)) base = 10 + 15 * annual_cycle.reshape(-1, 1) tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3) tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3) ds = xr.Dataset( { ""tmin"": ((""time"", ""location""), tmin_values), ""tmax"": ((""time"", ""location""), tmax_values), }, {""time"": times, ""location"": [""IA"", ""IN"", ""IL""]}, ) ds.to_netcdf(""test.nc"") ## error here ``` **Anything else we need to know?**: As mentioned above, the error on manifests when the dask package with no extras installed is present in the environment. (Many of the extras require the toolz package, at which time the import error goes away.) **Environment**: In a clean virtual environment, install the following packages. ``` pip install xarray netCDF4 dask ``` The package versions installed are as followed (generated by `pip freeze`): ``` cftime==1.1.3 dask==2.18.1 netCDF4==1.5.3 numpy==1.18.5 pandas==1.0.5 python-dateutil==2.8.1 pytz==2020.1 PyYAML==5.3.1 six==1.15.0 xarray==0.15.1 ``` (Also running python3.8.2 on Debian Linux, not that I suppose this matters.)
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.2+ (heads/3.8:882a7f44da, Apr 26 2020, 19:31:38) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.18.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4164/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue