home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where state = "closed" and user = 24508496 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 2

state 1

  • closed · 6 ✖

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2118308210 I_kwDOAMm_X85-QtFy 8707 Weird interaction between aggregation and multiprocessing on DaskArrays saschahofmann 24508496 closed 0     10 2024-02-05T11:35:28Z 2024-04-29T16:20:45Z 2024-04-29T16:20:44Z CONTRIBUTOR      

What happened?

When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing Pool class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time.

What did you expect to happen?

There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np from multiprocessing import Pool

datasets = [xr.Dataset( { "temperature": ( ["time", "location"], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={"time": [1, 2, 3, 4], "location": ["A", "B"]}, ).chunk(time=2) for i in range(4)]

def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim="time", how="all").compute()

This works as expected

dropped = [] for dataset in datasets: dropped.append(process(dataset))

This seems to never finish

with Pool(4) as p: dropped = p.map(process, datasets) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I am still running on 2023.08.0 see below for more details about the environment

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.8.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.2.1 conda: None pytest: 8.0.0 mypy: None IPython: 8.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8707/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2220487961 PR_kwDOAMm_X85rb6ea 8903 Update docstring for compute and persist saschahofmann 24508496 closed 0     2 2024-04-02T13:10:02Z 2024-04-03T07:45:10Z 2024-04-02T23:52:32Z CONTRIBUTOR   0 pydata/xarray/pulls/8903
  • Updates the docstring for persist to mention that it is not altering the original object.
  • Adds a return value to the docstring for compute and persist on both Dataset and DataArray

  • [x] Closes #8901

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8903/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2220228856 I_kwDOAMm_X86EVgD4 8901 Is .persist in place or like .compute? saschahofmann 24508496 closed 0     3 2024-04-02T11:09:59Z 2024-04-02T23:52:33Z 2024-04-02T23:52:33Z CONTRIBUTOR      

What is your issue?

I am playing around with using Dataset.persist and assumed it would work like .load. I also just looked at the source code and it looks to me like it should indeed replace the original data but I can see both in performance and the dask dashboard that steps are recomputed if I don't use the object returned by .persist which points me towards .persist behaving more like .compute.

In either case, I would make a PR to clarify in the docs whether persists leaves the original data untouched or not.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8901/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2205239889 PR_kwDOAMm_X85qoWZT 8873 Add dt.date to plottable types saschahofmann 24508496 closed 0     6 2024-03-25T09:07:33Z 2024-03-29T14:35:44Z 2024-03-29T14:35:41Z CONTRIBUTOR   0 pydata/xarray/pulls/8873

Simply adds datetime.date to plottable types in _ensure_plottable in plot/utils.pyL675 to enable the plotting of dates.

Matplotlib handles date types automatically, so I think there is no other change needed.

Do I need to add a test for this? Any pointers on where I would put it that?

  • [x] Closes #8866
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8873/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2202163545 I_kwDOAMm_X86DQllZ 8866 Cannot plot datetime.date dimension saschahofmann 24508496 closed 0     9 2024-03-22T10:18:04Z 2024-03-29T14:35:42Z 2024-03-29T14:35:42Z CONTRIBUTOR      

What happened?

I noticed that xarray doesnt support plotting when the x-axis is a datetime.date. In my case, I would like to plot hourly data aggregated by date. I know that in this particular case, I could just use .resample('1D') to achieve the same result and be able to plot it but I am wondering whether xarray shouldn't just also support plotting dates.

I am pretty sure that matplotlib supports date on the x-axis so maybe adding it to an acceptable type in plot/utils.py L675 in _ensure_plottable would already do the trick?

I am happy to look into this if this is a wanted feature.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np import datetime start = datetime.datetime(2024, 1,1) time = [start + datetime.timedelta(hours=x) for x in range(720)]

data = xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time))) data.groupby('time.date').mean().plot() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python TypeError: Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead.

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.13 (main, Aug 24 2023, 12:59:26) [Clang 15.0.0 (clang-1500.1.0.2.5)] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.1.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.21.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8866/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1607155972 I_kwDOAMm_X85fy0EE 7576 Rezarring an opened dataset with object dtype fails due to added filter saschahofmann 24508496 closed 0     2 2023-03-02T16:50:56Z 2023-03-20T15:41:32Z 2023-03-20T15:41:31Z CONTRIBUTOR      

What happened?

I am trying to save an xr.Dataset that I read and processed from another saved zarr file. But it fails with this error

numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode() TypeError: expected unicode string, found 3 It seems like the first time the dataset is saved, xarray/zarr is adding a VLenUTF8 filter to the encoding of one of the dimensions. If I pop the filters key from the opened dataset I can resave the file.

I can also safely save to netcdf (which makes sense since this encoding is probably ignored then).

What did you expect to happen?

I should be able to open and resave a file to zarr.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np da= xr.DataArray(np.array(['126469-423', '130042-0-10046', '120259-10343'], dtype='object'), dims=['asset'], name='asset')

da.to_dataset().to_zarr('~/Downloads/test.zarr', mode='w')

Fails with the error below

opened = xr.open_zarr('~/Downloads/test.zarr') opened.to_zarr('~/Downloads/test2.zarr', mode='w')

Saves successfully

opened.asset.encoding.pop('filters') opened.to_zarr('~Downloads/test2.zarr', mode='w')

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python TypeError Traceback (most recent call last) <ipython-input-16-b1f2f1d2b5a0> in <module> 6 opened = xr.open_zarr('~/Downloads/test.zarr') 7 ----> 8 opened.to_zarr('~/Downloads/test2.zarr', mode='w')

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 2097 from xarray.backends.api import to_zarr 2098 -> 2099 return to_zarr( # type: ignore 2100 self, 2101 store=store,

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 1668 writer = ArrayWriter() 1669 # TODO: figure out how to properly handle unlimited_dims -> 1670 dump_to_store(dataset, zstore, writer, encoding=encoding) 1671 writes = writer.sync(compute=compute) 1672

~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1277 variables, attrs = encoder(variables, attrs) 1278 -> 1279 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) ... 2112 # check object encoding

numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode()

TypeError: expected unicode string, found 3 ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 2023.1.0 pandas: 1.5.3 numpy: 1.22.4 scipy: 1.4.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.0 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2022.01.1 distributed: 2022.01.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None fsspec: 0.8.4 cupy: None pint: 0.16.1 sparse: None flox: None numpy_groupies: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: None pytest: 7.0.1 mypy: None IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7576/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1520.528ms · About: xarray-datasette