home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "closed", type = "issue" and user = 691772 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 2 ✖

state 1

  • closed · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1315111684 I_kwDOAMm_X85OYwME 6816 pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks() lumbric 691772 closed 0     5 2022-07-22T14:56:41Z 2022-09-13T09:39:48Z 2022-08-19T14:06:09Z CONTRIBUTOR      

What is your issue?

I'm doing a lengthy computation, which involves hundreds of GB of data using chunks and map_blocks() so that things fit into RAM and can be done in parallel. From time to time, the following error is raised:

pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

The line where this takes place looks pretty harmless:

x = a * b.sel(c=d.c)

It's a line inside the function func which is passed to a map_blocks() call. In this case a and b are xr.DataArray or xr.DataSet objects shadowed from outer scope and d is the parameter obj for map_blocks().

That means, the line below in the traceback looks like this:

xr.map_blocks(
    lambda d: worker(d).compute().chunk({"time": None}),
    d,
    template=template)

I guess it's some kind of race condition, since it's not 100% reproducible, but I have no idea how to further investigate the issue to create a proper bug report or fix my code.

Do you have any hint how I could continue building a minimal example or so in such a case? What does the error message want to tell me?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6816/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
438389323 MDU6SXNzdWU0MzgzODkzMjM= 2928 Dask outputs warning: "The da.atop function has moved to da.blockwise" lumbric 691772 closed 0     4 2019-04-29T15:59:31Z 2019-07-12T15:56:29Z 2019-07-12T15:56:28Z CONTRIBUTOR      

Problem description

dask 1.1.0 moved atop() to blockwise() and introduced a warning when atop() is used.

Related

  • upstream ticket and PR of dask change: dask/dask#4348 dask/dask#4035
  • the warning in the dask documentation in an xarray example, probably not on purpose
  • warnings have been already discussed in #2727, but not fixed there
  • same issue in a different project: pytroll/satpy#608

Code Sample

```python import numpy as np import xarray as xr

xr.DataArray(np.ones(1000)) d = xr.DataArray(np.ones(1000)) d.to_netcdf('/tmp/ones.nc') d = xr.open_dataarray('/tmp/ones.nc', chunks=10) xr.apply_ufunc(lambda x: 42 * x, d, dask='parallelized', output_dtypes=[np.float64]) ```

This outputs the warning: ...lib/python3.7/site-packages/dask/array/blockwise.py:204: UserWarning: The da.atop function has moved to da.blockwise warnings.warn("The da.atop function has moved to da.blockwise")

Expected Output

No warning. As user of a recent version of dask and xarray, there shouldn't be any warnings if everything is done right. The warning should be tackled inside xarray somehow.

Solution

Not sure, can xarray break compatibility with dask <1.1.0 with some future version? Otherwise I guess there needs to be some legacy code in xarray which calls the right function.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-17-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.3 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.2.0 distributed: 1.27.0 matplotlib: 3.0.3 cartopy: None seaborn: 0.9.0 setuptools: 41.0.0 pip: 19.1 conda: None pytest: 4.4.1 IPython: 7.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2928/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 799.795ms · About: xarray-datasette