home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where repo = 13221727 and user = 41870650 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 1
  • pull 1

state 1

  • closed 2

repo 1

  • xarray · 2 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
674142764 MDExOlB1bGxSZXF1ZXN0NDYzODg1MjM2 4318 Implicit dask import 4164 inakleinbottle 41870650 closed 0     2 2020-08-06T08:52:46Z 2020-08-06T16:12:58Z 2020-08-06T16:12:44Z CONTRIBUTOR   0 pydata/xarray/pulls/4318

This is a fix for Issue 4164. I haven't written any tests since this issue involves partial installation of dask, and while I know ways to fake the results of failed imports, I'm not sure these are appropriate for a test bed. The test script from the issue no longer raises an error.

  • [ ] Closes #4164
  • [ ] Tests added
  • [ ] Passes isort . && black . && mypy . && flake8
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4318/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
641504450 MDU6SXNzdWU2NDE1MDQ0NTA= 4164 Implicit use of dask feature inakleinbottle 41870650 closed 0     3 2020-06-18T19:41:47Z 2020-08-06T16:12:44Z 2020-08-06T16:12:44Z CONTRIBUTOR      

What happened: I tried to use the to_netcdf function to store a dataset into a NetCDF file, but the following exception was raised Traceback (most recent call last): File "dask-error.py", line 27, in <module> ds.to_netcdf("test.nc") File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/core/dataset.py", line 1544, in to_netcdf return to_netcdf( File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/api.py", line 1051, in to_netcdf scheduler = _get_scheduler() File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/xarray/backends/locks.py", line 79, in _get_scheduler actual_get = dask.base.get_scheduler(get, collection) AttributeError: module 'dask' has no attribute 'base' This code sample works perfectly as expected when the dask package is not installed in the environment, and the method works as expected. However, we dask is installed the _get_scheduler function is called and produces the error (this can be found here) https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L79

After a little digging through, the problem is that the base module in the dask package depends on the toolz package, which is not a default dependency of dask and so causes a silent import failure when dask initialises its namespace (https://github.com/dask/dask/blob/416d348f7174a302815758cb87dbf6983226ddc5/dask/init.py#L10). As a result, the base package is not importable form the dask top level, and importing it separately gives as follows from dask import base raises a ModuleNotFoundError. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sam/dev/xarray-test/.venv/lib/python3.8/site-packages/dask/base.py", line 13, in <module> from tlz import merge, groupby, curry, identity ModuleNotFoundError: No module named 'tlz' I recommend the following fix. At the following line in the _get_scheduler function https://github.com/pydata/xarray/blob/b9e6a36ff7a0ca3593165cf191f4152666fa4a66/xarray/backends/locks.py#L75 replace the import with the following from dask.base import get_scheduler and remove dask.base from the later call.

I should, however, point out that get_scheduler does not appear to be part of the Dask public API.

What you expected to happen: The to_netcdf method should have exited silently and created a new file in the working directory with the contents of the data set.

Minimal Complete Verifiable Example: This code is basically the "Toy weather data" example from the documentation, except for the last line. ```python import numpy as np import pandas as pd

import xarray as xr

np.random.seed(123)

xr.set_options(display_style="html")

times = pd.date_range("2000-01-01", "2001-12-31", name="time") annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1) tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3) tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset( { "tmin": (("time", "location"), tmin_values), "tmax": (("time", "location"), tmax_values), }, {"time": times, "location": ["IA", "IN", "IL"]}, )

ds.to_netcdf("test.nc") ## error here ```

Anything else we need to know?: As mentioned above, the error on manifests when the dask package with no extras installed is present in the environment. (Many of the extras require the toolz package, at which time the import error goes away.)

Environment: In a clean virtual environment, install the following packages. pip install xarray netCDF4 dask The package versions installed are as followed (generated by pip freeze): cftime==1.1.3 dask==2.18.1 netCDF4==1.5.3 numpy==1.18.5 pandas==1.0.5 python-dateutil==2.8.1 pytz==2020.1 PyYAML==5.3.1 six==1.15.0 xarray==0.15.1 (Also running python3.8.2 on Debian Linux, not that I suppose this matters.)

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2+ (heads/3.8:882a7f44da, Apr 26 2020, 19:31:38) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.18.1 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4164/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 42.897ms · About: xarray-datasette