id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1177669703,PR_kwDOAMm_X84029qU,6402,No chunk warning if empty,4666753,closed,0,,,6,2022-03-23T06:43:54Z,2022-04-09T20:27:46Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6402,"
- [x] Closes #6401
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6402/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1177665302,I_kwDOAMm_X85GMb8W,6401,Unnecessary warning when specifying `chunks` opening dataset with empty dimension,4666753,closed,0,,,0,2022-03-23T06:38:25Z,2022-04-09T20:27:40Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,,,"### What happened?
I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension).
If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.
### What did you expect to happen?
I expect no warning to be raised when there is no data:
- performance degradation on an empty array should be negligible.
- we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
# each `a` is expected to be chunked separately
ds = xr.Dataset({""x"": ((""a"", ""b""), np.empty((4, 0)))}).chunk({""a"": 1})
# but when we save it, it gets saved as a single chunk
ds.to_zarr(""tmp.zarr"")
# so if we open it up with expected chunksizes (not knowing that b is empty):
ds2 = xr.open_zarr(""tmp.zarr"", chunks={""a"": 1})
# we get a warning :(
```
### Relevant log output
```Python
{...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410:
UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime
nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch
unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead.
_check_chunks_compatibility(var, output_chunks, preferred_chunks)
```
### Anything else we need to know?
This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming).
### Environment
INSTALLED VERSIONS [3/1946]
------------------
commit: None
python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: None
xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.2
scipy: 1.8.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.11.1
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.4
dask: 2022.01.0
distributed: 2022.01.0
matplotlib: 3.5.1
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
setuptools: 59.8.0
pip: 22.0.4
conda: None
pytest: 7.0.1
IPython: 8.1.1
sphinx: 4.4.0","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6401/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1076377174,I_kwDOAMm_X85AKDZW,6062,Import hangs when matplotlib installed but no display available,4666753,closed,0,,,3,2021-12-10T03:12:55Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,,,"**What happened**: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged.
**What you expected to happen**: I expect to be able to run `import xarray` without needing to mess with environment variables or import matplotlib and change the default backend.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6062/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1076384122,PR_kwDOAMm_X84vp8Ru,6064,"Revert ""Single matplotlib import""",4666753,closed,0,,,3,2021-12-10T03:24:54Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6064,"Revert pydata/xarray#5794, which causes failure to import when used without display (issue #6062).
- [ ] Closes #6102
- [ ] Closes #6062","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6064/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1033142897,I_kwDOAMm_X849lIJx,5883,Failing parallel writes to_zarr with regions parameter?,4666753,closed,0,,,1,2021-10-22T03:33:02Z,2021-10-22T18:37:06Z,2021-10-22T18:37:06Z,CONTRIBUTOR,,,,"
**What happened**: Following guidance on how to use regions keyword in `xr.Dataset.to_zarr()`, I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail.
**What you expected to happen**: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not).
**Minimal Complete Verifiable Example**:
```python
path = ""tmp.zarr""
NTHREADS = 4 # when 1, things work as expected
import multiprocessing.dummy as mp # threads, instead of processes
import numpy as np
import dask.array as da
import xarray as xr
# dummy values for metadata
xr.Dataset(
{""x"": ((""a"", ""b""), -da.ones((10, 7), chunks=(None, 1)))},
{""apple"": (""a"", -da.ones(10, dtype=int, chunks=(1,)))},
).to_zarr(path, mode=""w"", compute=False)
# actual values to save
ds = xr.Dataset(
{""x"": ((""a"", ""b""), np.random.uniform(size=(10, 7)))},
{""apple"": (""a"", np.arange(10))},
)
# save them using NTHREADS
with mp.Pool(NTHREADS) as p:
p.map(
lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode=""r+"", region=dict(a=slice(idx, 1 + idx))),
range(10)
)
ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads
# perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a
xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1.
```
**Anything else we need to know?**:
+ this behavior is the same if coordinate ""apple"" (over a) is changed to be coordinate ""a"" (index over dimension)
+ if dummy dataset had ""apple"" defined using dask, I observed `ds_roundtrip` having all correct values of ""apple"" (but not ""x""). *But*, if it was defined as a numpy array, I observed `ds_roundtrip` having incorrect values of ""apple"" (in addition to ""x"").
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.0
xarray: 0.19.0
pandas: 1.3.3
numpy: 1.21.2
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.1
cartopy: None
seaborn: 0.11.2
numbagg: None
pint: None
setuptools: 58.2.0
pip: 21.3
conda: None
pytest: None
IPython: 7.28.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5883/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
980605063,MDExOlB1bGxSZXF1ZXN0NzIwODExNjQ4,5742,Fix saving chunked datasets with zero length dimensions,4666753,closed,0,,,2,2021-08-26T20:12:08Z,2021-10-10T00:12:34Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5742,"This fixes #5741 by loading to memory all variables with zero length before saving with `Dataset.to_zarr()`
- [x] Closes #5741
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5742/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
980549418,MDU6SXNzdWU5ODA1NDk0MTg=,5741,Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError),4666753,closed,0,,,0,2021-08-26T18:57:00Z,2021-10-10T00:02:42Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,,,"
**What happened**: I have an `xr.Dataset` with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error.
**What you expected to happen**: I expect it to save without any errors.
**Minimal Complete Verifiable Example**: the following commands fail.
```python
import numpy as np
import xarray as xr
ds = xr.Dataset(
{""x"": ((""a"", ""b"", ""c""), np.empty((75, 0, 30))), ""y"": ((""a"", ""c""), np.random.normal(size=(75, 30)))},
{""a"": np.arange(75), ""b"": [], ""c"": np.arange(30)},
).chunk({})
ds.to_zarr(""fails.zarr"") # RAISES ZeroDivisionError
```
**Anything else we need to know?**: If we load all the empty arrays to numpy, it is able to save correctly. That is:
```python
ds[""x""].load() # run on all variables that have a zero dimension
ds.to_zarr(""works.zarr"") # successfully runs
```
I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way.
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.0
xarray: 0.19.0
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.7.1
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.9.3
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5741/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
559958918,MDExOlB1bGxSZXF1ZXN0MzcxMDM0MDY2,3752,Fix swap_dims() index names (issue #3748),4666753,closed,0,,,5,2020-02-04T20:25:18Z,2020-02-24T23:33:05Z,2020-02-24T22:34:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3752,"
- [x] Closes #3748
- [x] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3752/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
559841620,MDU6SXNzdWU1NTk4NDE2MjA=,3748,`swap_dims()` incorrectly changes underlying index name,4666753,closed,0,,,1,2020-02-04T16:41:25Z,2020-02-24T22:34:58Z,2020-02-24T22:34:58Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
import xarray as xr
# create data array with named dimension and named coordinate
x = xr.DataArray([1], {""idx"": [2], ""y"": (""idx"", [3])}, [""idx""], name=""x"")
# what's our current index? (idx, this is fine)
x.indexes
# prints ""idx: Int64Index([2], dtype='int64', name='idx')""
# swap dim so that y is our dimension, what's index now?
x.swap_dims({""idx"": ""y""}).indexes
# prints ""y: Int64Index([3], dtype='int64', name='idx')""
```
The dimension name is appropriately swapped but the pandas index name is incorrect.
#### Expected Output
```python
# swap dim so that y is our dimension, what's index now?
x.swap_dims({""idx"": ""y""}).indexes
# prints ""y: Int64Index([3], dtype='int64', name='y')""
```
#### Problem Description
This is a problem because running `x.swap_dims({""idx"": ""y""}).to_dataframe()` gives
a dataframe with columns `[""x"", ""idx""]` and index `""idx""`. This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening.
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 0.25.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3748/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue