home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 641504450

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
641504450 MDU6SXNzdWU2NDE1MDQ0NTA= 4164 Implicit use of dask feature 41870650 closed 0     3 2020-06-18T19:41:47Z 2020-08-06T16:12:44Z 2020-08-06T16:12:44Z CONTRIBUTOR      

What happened: I tried to use the to_netcdf function to store a dataset into a NetCDF file, but the following exception was raised Traceback (most recent call last): File "dask-error.py", line 27, in <module> ds.to_netcdf("test.nc") File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/core/dataset.py", line 1544, in to_netcdf return to_netcdf( File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/api.py", line 1051, in to_netcdf scheduler = _get_scheduler() File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/locks.py", line 79, in _get_scheduler actual_get = dask.base.get_scheduler(get, collection) AttributeError: module 'dask' has no attribute 'base' This code sample works perfectly as expected when the dask package is not installed in the environment, and the method works as expected. However, we dask is installed the _get_scheduler function is called and produces the error (this can be found here) https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L79

After a little digging through, the problem is that the base module in the dask package depends on the toolz package, which is not a default dependency of dask and so causes a silent import failure when dask initialises its namespace (https://github.com/dask/dask/blob/416d348f7174a302815758cb87dbf6983226ddc5/dask/init.py#L10). As a result, the base package is not importable form the dask top level, and importing it separately gives as follows from dask import base raises a ModuleNotFoundError. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/dask/base.py", line 13, in <module> from tlz import merge, groupby, curry, identity ModuleNotFoundError: No module named 'tlz' I recommend the following fix. At the following line in the _get_scheduler function https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L75 replace the import with the following from dask.base import get_scheduler and remove dask.base from the later call.

I should, however, point out that get_scheduler does not appear to be part of the Dask public API.

What you expected to happen: The to_netcdf method should have exited silently and created a new file in the working directory with the contents of the data set.

Minimal Complete Verifiable Example: This code is basically the "Toy weather data" example from the documentation, except for the last line. ```python import numpy as np import pandas as pd

import xarray as xr

np.random.seed(123)

xr.set_options(display_style="html")

times = pd.date_range("2000-01-01", "2001-12-31", name="time") annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1) tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3) tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset( { "tmin": (("time", "location"), tmin_values), "tmax": (("time", "location"), tmax_values), }, {"time": times, "location": ["IA", "IN", "IL"]}, )

ds.to_netcdf("test.nc") ## error here ```

Anything else we need to know?: As mentioned above, the error on manifests when the dask package with no extras installed is present in the environment. (Many of the extras require the toolz package, at which time the import error goes away.)

Environment: In a clean virtual environment, install the following packages. pip install xarray netCDF4 dask The package versions installed are as followed (generated by pip freeze): cftime==1.1.3 dask==2.18.1 netCDF4==1.5.3 numpy==1.18.5 pandas==1.0.5 python-dateutil==2.8.1 pytz==2020.1 PyYAML==5.3.1 six==1.15.0 xarray==0.15.1 (Also running python3.8.2 on Debian Linux, not that I suppose this matters.)

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2+ (heads/3.8:882a7f44da, Apr 26 2020, 19:31:38) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.18.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4164/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.688ms · About: xarray-datasette