id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
851391441,MDU6SXNzdWU4NTEzOTE0NDE=,5115,`to_zarr()` dramatically alters dask graph,6582745,closed,0,,,4,2021-04-06T12:50:04Z,2022-04-19T09:09:58Z,2022-04-19T03:46:55Z,NONE,,,,"**What happened**:
The dask graph before a `to_zarr()` call differs wildly from the dask graph after a `to_zarr()` call.
**What you expected to happen**:
I would expect `to_zarr()` to add layers/dependencies to the graph as normal.
**Minimal Complete Verifiable Example**:
```python
import xarray
import dask.array as da
from pprint import pprint
if __name__ == ""__main__"":
arr = da.ones((2,), chunks=(1,)) + 1
xds = xarray.Dataset({""arr"": ((""x"",), arr)})
pprint(xds.arr.data.__dask_graph__().layers)
pprint(xds.arr.data.__dask_graph__().dependencies)
xds = xds.to_zarr(""out.zarr"", mode=""w"", compute=False)
pprint(xds.arr.data__dask_graph__().layers)
pprint(xds.arr.data.__dask_graph__().dependencies)
```
**Anything else we need to know?**:
On my system the above will print the following before the `to_zarr()` call:
```
# layers
{'add-1118924c7d3d06d9d07bcca6afde2c7e': Blockwise<(('ones-76dd1e004518465cc97010eea7a88ebc', ('.0',)), (1, None)) -> add-1118924c7d3d06d9d07bcca6afde2c7e>,
'ones-76dd1e004518465cc97010eea7a88ebc': Blockwise<(('blockwise-create-ones-76dd1e004518465cc97010eea7a88ebc', (0,)),) -> ones-76dd1e004518465cc97010eea7a88ebc>}
# deps
{'add-1118924c7d3d06d9d07bcca6afde2c7e': {'ones-76dd1e004518465cc97010eea7a88ebc'},
'ones-76dd1e004518465cc97010eea7a88ebc': set()}
```
and
```
# layers
Delayed('getattr-bf22b6050bac2d8ef0a78589b04365f3')
# deps
{139853652717696: set(),
139853652760176: set(),
'_finalize_store-faeab92e-4e8d-4155-a915-cbfe8addae8e': {'store-648c67ef-96d5-11eb-ae7e-fc77746741ed'},
'getattr-84732ba2ce83b0568edc0dad83f2d611': {'getattr-c0220fc5eded903243bac6f4a8067a7b'},
'getattr-c0220fc5eded903243bac6f4a8067a7b': {'_finalize_store-faeab92e-4e8d-4155-a915-cbfe8addae8e'}}
```
after. This seems a little strange as the layers describing `arr` have disappeared. This is problematic when attempting to do any post-processing/optimization/annotation on the graph.
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 (default, Feb 20 2021, 21:09:14)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-7648-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.17.0
pandas: 1.2.3
numpy: 1.19.5
scipy: 1.6.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.03.0+49.gf4132551
distributed: 2021.03.0+29.g3b8b97e3
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 54.1.2
pip: 21.0.1
conda: None
pytest: 6.2.2
IPython: 7.21.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5115/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
702646191,MDU6SXNzdWU3MDI2NDYxOTE=,4428,Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0,6582745,closed,0,,,8,2020-09-16T10:26:38Z,2021-07-04T04:12:34Z,2021-07-04T04:12:34Z,NONE,,,,"**What happened**:
A project of mine suddenly broke with:
```
ValueError: Object has inconsistent chunks along dimension row. This can be fixed by calling unify_chunks().
```
where previously it had worked.
**What you expected to happen**:
There should have been no change.
**Minimal Complete Verifiable Example**:
This is very difficult to reproduce. I have tried, but it clearly isn't triggered for relatively simple xarray.Datasets. In my code, the Datasets in question are the result of multiple concatenations, selection and chunking operations. What I shall do instead is attempt to demonstrate the change, in the hopes that someone more knowledgeable has some intuition for what has gone wrong.
***dask==2.25.0***
I have a dataset, foo, with a number of different variables, most indexed by row. I will focus on one variable to demonstrate the change in behaviour, specifically FLAG. This is what flag looks like prior to a `foo.sortby(""row"")` call. Note that there is only a single chunk (this is intentional).
```
dask.array
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076
Dimensions without coordinates: chan, corr
```
After the `foo.sortby(""row"")` call:
```
dask.array
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078
Dimensions without coordinates: chan, corr
```
Note that the chunksize is unchanged.
***dask==2.26.0***
Repeating exactly the same experiment, prior to the call:
```
dask.array
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505074 505075 505076
Dimensions without coordinates: chan, corr
```
After the `foo.sortby(""row"")` call:
```
dask.array
Coordinates:
* row (row) int64 462991 462993 462994 462996 ... 505076 505077 505078
Dimensions without coordinates: chan, corr
```
Note the change in the chunksize.
**Anything else we need to know?**:
I have seen similar behaviour when using xarray.Dataset.sel.
**Environment**:
***dask==2.25.0***
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-7648-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.15.1
pandas: 1.1.2
numpy: 1.19.2
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.25.0
distributed: 2.26.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 50.3.0
pip: 20.2.3
conda: None
pytest: 6.0.2
IPython: None
sphinx: None
```
***dask==2.26.0***
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 5.3.0-7648-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.15.1
pandas: 1.1.2
numpy: 1.19.2
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.26.0
distributed: 2.26.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 50.3.0
pip: 20.2.3
conda: None
pytest: 6.0.2
IPython: None
sphinx: None
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4428/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue