id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2118308210,I_kwDOAMm_X85-QtFy,8707,Weird interaction between aggregation and multiprocessing on DaskArrays,24508496,closed,0,,,10,2024-02-05T11:35:28Z,2024-04-29T16:20:45Z,2024-04-29T16:20:44Z,CONTRIBUTOR,,,,"### What happened?
When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing `Pool` class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time.
### What did you expect to happen?
There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
from multiprocessing import Pool
datasets = [xr.Dataset(
{
""temperature"": (
[""time"", ""location""],
[[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]],
)
},
coords={""time"": [1, 2, 3, 4], ""location"": [""A"", ""B""]},
).chunk(time=2) for i in range(4)]
def process(dataset):
return dataset.rolling(dim={'time':2}).sum().dropna(dim=""time"", how=""all"").compute()
# This works as expected
dropped = []
for dataset in datasets:
dropped.append(process(dataset))
# This seems to never finish
with Pool(4) as p:
dropped = p.map(process, datasets)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
I am still running on 2023.08.0 see below for more details about the environment
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2023.8.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2024.1.1
distributed: 2024.1.1
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: 0.23
sparse: None
flox: 0.9.0
numpy_groupies: 0.10.2
setuptools: 69.0.3
pip: 23.2.1
conda: None
pytest: 8.0.0
mypy: None
IPython: 8.18.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8707/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2220228856,I_kwDOAMm_X86EVgD4,8901,Is .persist in place or like .compute?,24508496,closed,0,,,3,2024-04-02T11:09:59Z,2024-04-02T23:52:33Z,2024-04-02T23:52:33Z,CONTRIBUTOR,,,,"### What is your issue?
I am playing around with using `Dataset.persist` and assumed it would work like `.load`. I also just looked at the source code and it looks to me like it should indeed replace the original data *but* I can see both in performance and the dask dashboard that steps are recomputed if I don't use the object returned by `.persist` which points me towards `.persist` behaving more like `.compute`.
In either case, I would make a PR to clarify in the docs whether persists leaves the original data untouched or not.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8901/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2202163545,I_kwDOAMm_X86DQllZ,8866,Cannot plot datetime.date dimension,24508496,closed,0,,,9,2024-03-22T10:18:04Z,2024-03-29T14:35:42Z,2024-03-29T14:35:42Z,CONTRIBUTOR,,,,"### What happened?
I noticed that xarray doesnt support plotting when the x-axis is a `datetime.date`. In my case, I would like to plot hourly data aggregated by date. I know that in this particular case, I could just use `.resample('1D')` to achieve the same result and be able to plot it but I am wondering whether xarray shouldn't just also support plotting dates.
I am pretty sure that matplotlib supports date on the x-axis so maybe adding it to an acceptable type in *plot/utils.py* L675 in `_ensure_plottable` would already do the trick?
I am happy to look into this if this is a wanted feature.
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
import datetime
start = datetime.datetime(2024, 1,1)
time = [start + datetime.timedelta(hours=x) for x in range(720)]
data = xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time)))
data.groupby('time.date').mean().plot()
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
TypeError: Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead.
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.13 (main, Aug 24 2023, 12:59:26) [Clang 15.0.0 (clang-1500.1.0.2.5)]
python-bits: 64
OS: Darwin
OS-release: 22.1.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.7
dask: 2024.1.1
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.1.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.21.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8866/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1607155972,I_kwDOAMm_X85fy0EE,7576,Rezarring an opened dataset with object dtype fails due to added filter,24508496,closed,0,,,2,2023-03-02T16:50:56Z,2023-03-20T15:41:32Z,2023-03-20T15:41:31Z,CONTRIBUTOR,,,,"### What happened?
I am trying to save an `xr.Dataset` that I read and processed from another saved zarr file. But it fails with this error
```
numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode()
TypeError: expected unicode string, found 3
```
It seems like the first time the dataset is saved, xarray/zarr is adding a `VLenUTF8` filter to the encoding of one of the dimensions. If I pop the `filters` key from the opened dataset I can resave the file.
I can also safely save to netcdf (which makes sense since this encoding is probably ignored then).
### What did you expect to happen?
I should be able to open and resave a file to zarr.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
da= xr.DataArray(np.array(['126469-423', '130042-0-10046', '120259-10343'], dtype='object'), dims=['asset'], name='asset')
da.to_dataset().to_zarr('~/Downloads/test.zarr', mode='w')
# Fails with the error below
opened = xr.open_zarr('~/Downloads/test.zarr')
opened.to_zarr('~/Downloads/test2.zarr', mode='w')
# Saves successfully
opened.asset.encoding.pop('filters')
opened.to_zarr('~Downloads/test2.zarr', mode='w')
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
TypeError Traceback (most recent call last)
in
6 opened = xr.open_zarr('~/Downloads/test.zarr')
7
----> 8 opened.to_zarr('~/Downloads/test2.zarr', mode='w')
~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version)
2097 from xarray.backends.api import to_zarr
2098
-> 2099 return to_zarr( # type: ignore
2100 self,
2101 store=store,
~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version)
1668 writer = ArrayWriter()
1669 # TODO: figure out how to properly handle unlimited_dims
-> 1670 dump_to_store(dataset, zstore, writer, encoding=encoding)
1671 writes = writer.sync(compute=compute)
1672
~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1277 variables, attrs = encoder(variables, attrs)
1278
-> 1279 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
...
2112 # check object encoding
numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode()
TypeError: expected unicode string, found 3
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.22.4
scipy: 1.4.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.0
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2022.01.1
distributed: 2022.01.1
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
fsspec: 0.8.4
cupy: None
pint: 0.16.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 50.3.0.post20201006
pip: 20.2.3
conda: None
pytest: 7.0.1
mypy: None
IPython: 7.18.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7576/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue