id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2118308210,I_kwDOAMm_X85-QtFy,8707,Weird interaction between aggregation and multiprocessing on DaskArrays,24508496,closed,0,,,10,2024-02-05T11:35:28Z,2024-04-29T16:20:45Z,2024-04-29T16:20:44Z,CONTRIBUTOR,,,,"### What happened? When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing `Pool` class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time. ### What did you expect to happen? There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np from multiprocessing import Pool datasets = [xr.Dataset( { ""temperature"": ( [""time"", ""location""], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={""time"": [1, 2, 3, 4], ""location"": [""A"", ""B""]}, ).chunk(time=2) for i in range(4)] def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim=""time"", how=""all"").compute() # This works as expected dropped = [] for dataset in datasets: dropped.append(process(dataset)) # This seems to never finish with Pool(4) as p: dropped = p.map(process, datasets) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? I am still running on 2023.08.0 see below for more details about the environment ### Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.8.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.2.1 conda: None pytest: 8.0.0 mypy: None IPython: 8.18.1 sphinx: None

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8707/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2220228856,I_kwDOAMm_X86EVgD4,8901,Is .persist in place or like .compute?,24508496,closed,0,,,3,2024-04-02T11:09:59Z,2024-04-02T23:52:33Z,2024-04-02T23:52:33Z,CONTRIBUTOR,,,,"### What is your issue? I am playing around with using `Dataset.persist` and assumed it would work like `.load`. I also just looked at the source code and it looks to me like it should indeed replace the original data *but* I can see both in performance and the dask dashboard that steps are recomputed if I don't use the object returned by `.persist` which points me towards `.persist` behaving more like `.compute`. In either case, I would make a PR to clarify in the docs whether persists leaves the original data untouched or not.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8901/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2202163545,I_kwDOAMm_X86DQllZ,8866,Cannot plot datetime.date dimension,24508496,closed,0,,,9,2024-03-22T10:18:04Z,2024-03-29T14:35:42Z,2024-03-29T14:35:42Z,CONTRIBUTOR,,,,"### What happened? I noticed that xarray doesnt support plotting when the x-axis is a `datetime.date`. In my case, I would like to plot hourly data aggregated by date. I know that in this particular case, I could just use `.resample('1D')` to achieve the same result and be able to plot it but I am wondering whether xarray shouldn't just also support plotting dates. I am pretty sure that matplotlib supports date on the x-axis so maybe adding it to an acceptable type in *plot/utils.py* L675 in `_ensure_plottable` would already do the trick? I am happy to look into this if this is a wanted feature. ### What did you expect to happen? _No response_ ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np import datetime start = datetime.datetime(2024, 1,1) time = [start + datetime.timedelta(hours=x) for x in range(720)] data = xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time))) data.groupby('time.date').mean().plot() ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python TypeError: Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead. ``` ### Anything else we need to know? _No response_ ### Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.13 (main, Aug 24 2023, 12:59:26) [Clang 15.0.0 (clang-1500.1.0.2.5)] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.1.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.21.0 sphinx: None

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8866/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1607155972,I_kwDOAMm_X85fy0EE,7576,Rezarring an opened dataset with object dtype fails due to added filter,24508496,closed,0,,,2,2023-03-02T16:50:56Z,2023-03-20T15:41:32Z,2023-03-20T15:41:31Z,CONTRIBUTOR,,,,"### What happened? I am trying to save an `xr.Dataset` that I read and processed from another saved zarr file. But it fails with this error ``` numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode() TypeError: expected unicode string, found 3 ``` It seems like the first time the dataset is saved, xarray/zarr is adding a `VLenUTF8` filter to the encoding of one of the dimensions. If I pop the `filters` key from the opened dataset I can resave the file. I can also safely save to netcdf (which makes sense since this encoding is probably ignored then). ### What did you expect to happen? I should be able to open and resave a file to zarr. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np da= xr.DataArray(np.array(['126469-423', '130042-0-10046', '120259-10343'], dtype='object'), dims=['asset'], name='asset') da.to_dataset().to_zarr('~/Downloads/test.zarr', mode='w') # Fails with the error below opened = xr.open_zarr('~/Downloads/test.zarr') opened.to_zarr('~/Downloads/test2.zarr', mode='w') # Saves successfully opened.asset.encoding.pop('filters') opened.to_zarr('~Downloads/test2.zarr', mode='w') ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python TypeError Traceback (most recent call last) in 6 opened = xr.open_zarr('~/Downloads/test.zarr') 7 ----> 8 opened.to_zarr('~/Downloads/test2.zarr', mode='w') ~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 2097 from xarray.backends.api import to_zarr 2098 -> 2099 return to_zarr( # type: ignore 2100 self, 2101 store=store, ~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version) 1668 writer = ArrayWriter() 1669 # TODO: figure out how to properly handle unlimited_dims -> 1670 dump_to_store(dataset, zstore, writer, encoding=encoding) 1671 writes = writer.sync(compute=compute) 1672 ~/micromamba/envs/xr/lib/python3.8/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1277 variables, attrs = encoder(variables, attrs) 1278 -> 1279 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) ... 2112 # check object encoding numcodecs/vlen.pyx in numcodecs.vlen.VLenUTF8.encode() TypeError: expected unicode string, found 3 ``` ### Anything else we need to know? _No response_ ### Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 2023.1.0 pandas: 1.5.3 numpy: 1.22.4 scipy: 1.4.1 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.0 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2022.01.1 distributed: 2022.01.1 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None fsspec: 0.8.4 cupy: None pint: 0.16.1 sparse: None flox: None numpy_groupies: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: None pytest: 7.0.1 mypy: None IPython: 7.18.1 sphinx: None

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7576/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue