id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1177669703,PR_kwDOAMm_X84029qU,6402,No chunk warning if empty,4666753,closed,0,,,6,2022-03-23T06:43:54Z,2022-04-09T20:27:46Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6402," - [x] Closes #6401 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6402/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1177665302,I_kwDOAMm_X85GMb8W,6401,Unnecessary warning when specifying `chunks` opening dataset with empty dimension,4666753,closed,0,,,0,2022-03-23T06:38:25Z,2022-04-09T20:27:40Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,,,"### What happened? I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension). If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk. ### What did you expect to happen? I expect no warning to be raised when there is no data: - performance degradation on an empty array should be negligible. - we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np # each `a` is expected to be chunked separately ds = xr.Dataset({""x"": ((""a"", ""b""), np.empty((4, 0)))}).chunk({""a"": 1}) # but when we save it, it gets saved as a single chunk ds.to_zarr(""tmp.zarr"") # so if we open it up with expected chunksizes (not knowing that b is empty): ds2 = xr.open_zarr(""tmp.zarr"", chunks={""a"": 1}) # we get a warning :( ``` ### Relevant log output ```Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks) ``` ### Anything else we need to know? This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming). ### Environment INSTALLED VERSIONS [3/1946] ------------------ commit: None python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6401/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1076377174,I_kwDOAMm_X85AKDZW,6062,Import hangs when matplotlib installed but no display available,4666753,closed,0,,,3,2021-12-10T03:12:55Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,,,"**What happened**: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged. **What you expected to happen**: I expect to be able to run `import xarray` without needing to mess with environment variables or import matplotlib and change the default backend. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6062/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1076384122,PR_kwDOAMm_X84vp8Ru,6064,"Revert ""Single matplotlib import""",4666753,closed,0,,,3,2021-12-10T03:24:54Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6064,"Revert pydata/xarray#5794, which causes failure to import when used without display (issue #6062). - [ ] Closes #6102 - [ ] Closes #6062","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6064/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1033142897,I_kwDOAMm_X849lIJx,5883,Failing parallel writes to_zarr with regions parameter?,4666753,closed,0,,,1,2021-10-22T03:33:02Z,2021-10-22T18:37:06Z,2021-10-22T18:37:06Z,CONTRIBUTOR,,,," **What happened**: Following guidance on how to use regions keyword in `xr.Dataset.to_zarr()`, I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail. **What you expected to happen**: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not). **Minimal Complete Verifiable Example**: ```python path = ""tmp.zarr"" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes import numpy as np import dask.array as da import xarray as xr # dummy values for metadata xr.Dataset( {""x"": ((""a"", ""b""), -da.ones((10, 7), chunks=(None, 1)))}, {""apple"": (""a"", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode=""w"", compute=False) # actual values to save ds = xr.Dataset( {""x"": ((""a"", ""b""), np.random.uniform(size=(10, 7)))}, {""apple"": (""a"", np.arange(10))}, ) # save them using NTHREADS with mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode=""r+"", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads # perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ``` **Anything else we need to know?**: + this behavior is the same if coordinate ""apple"" (over a) is changed to be coordinate ""a"" (index over dimension) + if dummy dataset had ""apple"" defined using dask, I observed `ds_roundtrip` having all correct values of ""apple"" (but not ""x""). *But*, if it was defined as a numpy array, I observed `ds_roundtrip` having incorrect values of ""apple"" (in addition to ""x""). **Environment**:
Output of xr.show_versions() ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5883/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 980605063,MDExOlB1bGxSZXF1ZXN0NzIwODExNjQ4,5742,Fix saving chunked datasets with zero length dimensions,4666753,closed,0,,,2,2021-08-26T20:12:08Z,2021-10-10T00:12:34Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5742,"This fixes #5741 by loading to memory all variables with zero length before saving with `Dataset.to_zarr()` - [x] Closes #5741 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5742/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 980549418,MDU6SXNzdWU5ODA1NDk0MTg=,5741,Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError),4666753,closed,0,,,0,2021-08-26T18:57:00Z,2021-10-10T00:02:42Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,,," **What happened**: I have an `xr.Dataset` with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error. **What you expected to happen**: I expect it to save without any errors. **Minimal Complete Verifiable Example**: the following commands fail. ```python import numpy as np import xarray as xr ds = xr.Dataset( {""x"": ((""a"", ""b"", ""c""), np.empty((75, 0, 30))), ""y"": ((""a"", ""c""), np.random.normal(size=(75, 30)))}, {""a"": np.arange(75), ""b"": [], ""c"": np.arange(30)}, ).chunk({}) ds.to_zarr(""fails.zarr"") # RAISES ZeroDivisionError ``` **Anything else we need to know?**: If we load all the empty arrays to numpy, it is able to save correctly. That is: ```python ds[""x""].load() # run on all variables that have a zero dimension ds.to_zarr(""works.zarr"") # successfully runs ``` I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way. **Environment**:
Output of xr.show_versions() ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.9.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5741/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 559958918,MDExOlB1bGxSZXF1ZXN0MzcxMDM0MDY2,3752,Fix swap_dims() index names (issue #3748),4666753,closed,0,,,5,2020-02-04T20:25:18Z,2020-02-24T23:33:05Z,2020-02-24T22:34:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3752," - [x] Closes #3748 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3752/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 559841620,MDU6SXNzdWU1NTk4NDE2MjA=,3748,`swap_dims()` incorrectly changes underlying index name,4666753,closed,0,,,1,2020-02-04T16:41:25Z,2020-02-24T22:34:58Z,2020-02-24T22:34:58Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray as xr # create data array with named dimension and named coordinate x = xr.DataArray([1], {""idx"": [2], ""y"": (""idx"", [3])}, [""idx""], name=""x"") # what's our current index? (idx, this is fine) x.indexes # prints ""idx: Int64Index([2], dtype='int64', name='idx')"" # swap dim so that y is our dimension, what's index now? x.swap_dims({""idx"": ""y""}).indexes # prints ""y: Int64Index([3], dtype='int64', name='idx')"" ``` The dimension name is appropriately swapped but the pandas index name is incorrect. #### Expected Output ```python # swap dim so that y is our dimension, what's index now? x.swap_dims({""idx"": ""y""}).indexes # prints ""y: Int64Index([3], dtype='int64', name='y')"" ``` #### Problem Description This is a problem because running `x.swap_dims({""idx"": ""y""}).to_dataframe()` gives a dataframe with columns `[""x"", ""idx""]` and index `""idx""`. This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening. #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3748/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue