id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
668256331,MDU6SXNzdWU2NjgyNTYzMzE=,4288,hue argument for xarray.plot.step() for plotting multiple histograms over shared bins,4666753,open,0,,,2,2020-07-30T00:30:37Z,2022-04-17T19:27:28Z,,CONTRIBUTOR,,,,"
**Is your feature request related to a problem? Please describe.**
I love how efficiently we can plot line data for different observations using `xr.DataArray.plot(hue={hue coordinate name})` over a 2D array, and I have appreciated `xr.DataArray.plot.step()` for plotting histogram data using interval coordinates. Today, I wanted to plot/compare several histograms over the same set of bins. I figured I could write `xr.DataArray.plot.step(hue={...})`, but I found out that this functionality is not implemented.
**Describe the solution you'd like**
I think we should have a hue kwarg for `xr.DataArray.plot.step()`. When specified, we would be able to plot 2D data in the same way as `xr.DataArray.plot()`, except that we get a set of step plots instead of a set of line plots.
**Describe alternatives you've considered**
+ Use `xr.DataArray.plot()` instead. This is effective for histograms with many bins, but inaccurately represents histograms with coarse bins
+ Manually call `xr.DataArray.plot.hist()` on each 1D subarray for each label on the hue coordinate, adding appropriate labels and legend. This is fine and my current solution, but I think it would be excellent to use the same shorthand that was developed for line plots.
**Additional context**
I didn't evaluate the other plotting functions implemented, but I suspect that others could appropriately consider a hue argument but do not yet support doing so. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4288/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1177669703,PR_kwDOAMm_X84029qU,6402,No chunk warning if empty,4666753,closed,0,,,6,2022-03-23T06:43:54Z,2022-04-09T20:27:46Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6402,"
- [x] Closes #6401
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6402/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1177665302,I_kwDOAMm_X85GMb8W,6401,Unnecessary warning when specifying `chunks` opening dataset with empty dimension,4666753,closed,0,,,0,2022-03-23T06:38:25Z,2022-04-09T20:27:40Z,2022-04-09T20:27:40Z,CONTRIBUTOR,,,,"### What happened?
I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension).
If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.
### What did you expect to happen?
I expect no warning to be raised when there is no data:
- performance degradation on an empty array should be negligible.
- we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
# each `a` is expected to be chunked separately
ds = xr.Dataset({""x"": ((""a"", ""b""), np.empty((4, 0)))}).chunk({""a"": 1})
# but when we save it, it gets saved as a single chunk
ds.to_zarr(""tmp.zarr"")
# so if we open it up with expected chunksizes (not knowing that b is empty):
ds2 = xr.open_zarr(""tmp.zarr"", chunks={""a"": 1})
# we get a warning :(
```
### Relevant log output
```Python
{...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410:
UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime
nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch
unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead.
_check_chunks_compatibility(var, output_chunks, preferred_chunks)
```
### Anything else we need to know?
This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming).
### Environment
INSTALLED VERSIONS [3/1946]
------------------
commit: None
python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: None
xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.2
scipy: 1.8.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.11.1
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.4
dask: 2022.01.0
distributed: 2022.01.0
matplotlib: 3.5.1
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
setuptools: 59.8.0
pip: 22.0.4
conda: None
pytest: 7.0.1
IPython: 8.1.1
sphinx: 4.4.0","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6401/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
552987067,MDU6SXNzdWU1NTI5ODcwNjc=,3712,"[Documentation/API?] {DataArray,Dataset}.sortby is stable sort?",4666753,open,0,,,0,2020-01-21T16:27:37Z,2022-04-09T02:26:34Z,,CONTRIBUTOR,,,,"I noticed that `{DataArray,Dataset}.sortby()` are implemented using `np.lexsort()`, which is a stable sort. Can we expect this function to remain a stable sort in the future even if the implementation is changed for some reason?
It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3712/reactions"", ""total_count"": 3, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue
1076377174,I_kwDOAMm_X85AKDZW,6062,Import hangs when matplotlib installed but no display available,4666753,closed,0,,,3,2021-12-10T03:12:55Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,,,"**What happened**: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged.
**What you expected to happen**: I expect to be able to run `import xarray` without needing to mess with environment variables or import matplotlib and change the default backend.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6062/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1076384122,PR_kwDOAMm_X84vp8Ru,6064,"Revert ""Single matplotlib import""",4666753,closed,0,,,3,2021-12-10T03:24:54Z,2021-12-29T07:56:59Z,2021-12-29T07:56:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6064,"Revert pydata/xarray#5794, which causes failure to import when used without display (issue #6062).
- [ ] Closes #6102
- [ ] Closes #6062","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6064/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1033142897,I_kwDOAMm_X849lIJx,5883,Failing parallel writes to_zarr with regions parameter?,4666753,closed,0,,,1,2021-10-22T03:33:02Z,2021-10-22T18:37:06Z,2021-10-22T18:37:06Z,CONTRIBUTOR,,,,"
**What happened**: Following guidance on how to use regions keyword in `xr.Dataset.to_zarr()`, I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail.
**What you expected to happen**: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not).
**Minimal Complete Verifiable Example**:
```python
path = ""tmp.zarr""
NTHREADS = 4 # when 1, things work as expected
import multiprocessing.dummy as mp # threads, instead of processes
import numpy as np
import dask.array as da
import xarray as xr
# dummy values for metadata
xr.Dataset(
{""x"": ((""a"", ""b""), -da.ones((10, 7), chunks=(None, 1)))},
{""apple"": (""a"", -da.ones(10, dtype=int, chunks=(1,)))},
).to_zarr(path, mode=""w"", compute=False)
# actual values to save
ds = xr.Dataset(
{""x"": ((""a"", ""b""), np.random.uniform(size=(10, 7)))},
{""apple"": (""a"", np.arange(10))},
)
# save them using NTHREADS
with mp.Pool(NTHREADS) as p:
p.map(
lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode=""r+"", region=dict(a=slice(idx, 1 + idx))),
range(10)
)
ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads
# perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a
xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1.
```
**Anything else we need to know?**:
+ this behavior is the same if coordinate ""apple"" (over a) is changed to be coordinate ""a"" (index over dimension)
+ if dummy dataset had ""apple"" defined using dask, I observed `ds_roundtrip` having all correct values of ""apple"" (but not ""x""). *But*, if it was defined as a numpy array, I observed `ds_roundtrip` having incorrect values of ""apple"" (in addition to ""x"").
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.0
xarray: 0.19.0
pandas: 1.3.3
numpy: 1.21.2
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.1
cartopy: None
seaborn: 0.11.2
numbagg: None
pint: None
setuptools: 58.2.0
pip: 21.3
conda: None
pytest: None
IPython: 7.28.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5883/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
980605063,MDExOlB1bGxSZXF1ZXN0NzIwODExNjQ4,5742,Fix saving chunked datasets with zero length dimensions,4666753,closed,0,,,2,2021-08-26T20:12:08Z,2021-10-10T00:12:34Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5742,"This fixes #5741 by loading to memory all variables with zero length before saving with `Dataset.to_zarr()`
- [x] Closes #5741
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5742/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
980549418,MDU6SXNzdWU5ODA1NDk0MTg=,5741,Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError),4666753,closed,0,,,0,2021-08-26T18:57:00Z,2021-10-10T00:02:42Z,2021-10-10T00:02:42Z,CONTRIBUTOR,,,,"
**What happened**: I have an `xr.Dataset` with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error.
**What you expected to happen**: I expect it to save without any errors.
**Minimal Complete Verifiable Example**: the following commands fail.
```python
import numpy as np
import xarray as xr
ds = xr.Dataset(
{""x"": ((""a"", ""b"", ""c""), np.empty((75, 0, 30))), ""y"": ((""a"", ""c""), np.random.normal(size=(75, 30)))},
{""a"": np.arange(75), ""b"": [], ""c"": np.arange(30)},
).chunk({})
ds.to_zarr(""fails.zarr"") # RAISES ZeroDivisionError
```
**Anything else we need to know?**: If we load all the empty arrays to numpy, it is able to save correctly. That is:
```python
ds[""x""].load() # run on all variables that have a zero dimension
ds.to_zarr(""works.zarr"") # successfully runs
```
I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way.
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.0
xarray: 0.19.0
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.7.1
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.9.3
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5741/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
559958918,MDExOlB1bGxSZXF1ZXN0MzcxMDM0MDY2,3752,Fix swap_dims() index names (issue #3748),4666753,closed,0,,,5,2020-02-04T20:25:18Z,2020-02-24T23:33:05Z,2020-02-24T22:34:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3752,"
- [x] Closes #3748
- [x] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3752/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
559841620,MDU6SXNzdWU1NTk4NDE2MjA=,3748,`swap_dims()` incorrectly changes underlying index name,4666753,closed,0,,,1,2020-02-04T16:41:25Z,2020-02-24T22:34:58Z,2020-02-24T22:34:58Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
import xarray as xr
# create data array with named dimension and named coordinate
x = xr.DataArray([1], {""idx"": [2], ""y"": (""idx"", [3])}, [""idx""], name=""x"")
# what's our current index? (idx, this is fine)
x.indexes
# prints ""idx: Int64Index([2], dtype='int64', name='idx')""
# swap dim so that y is our dimension, what's index now?
x.swap_dims({""idx"": ""y""}).indexes
# prints ""y: Int64Index([3], dtype='int64', name='idx')""
```
The dimension name is appropriately swapped but the pandas index name is incorrect.
#### Expected Output
```python
# swap dim so that y is our dimension, what's index now?
x.swap_dims({""idx"": ""y""}).indexes
# prints ""y: Int64Index([3], dtype='int64', name='y')""
```
#### Problem Description
This is a problem because running `x.swap_dims({""idx"": ""y""}).to_dataframe()` gives
a dataframe with columns `[""x"", ""idx""]` and index `""idx""`. This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening.
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 0.25.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3748/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue