id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1038531231,PR_kwDOAMm_X84tzEEk,5906,Avoid accessing slow .data in unstack,1312546,closed,0,,,4,2021-10-28T13:39:36Z,2021-10-29T15:29:39Z,2021-10-29T15:14:43Z,MEMBER,,0,pydata/xarray/pulls/5906,"- [x] Closes https://github.com/pydata/xarray/issues/5902
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1037894157,I_kwDOAMm_X8493QIN,5902,Slow performance of `DataArray.unstack()` from checking `variable.data`,1312546,closed,0,,,4,2021-10-27T21:54:48Z,2021-10-29T15:21:24Z,2021-10-29T15:21:24Z,MEMBER,,,,"**What happened**:
Calling `DataArray.unstack()` spends time allocating an object-dtype NumPy array from values of the pandas MultiIndex.
**What you expected to happen**:
Faster unstack.
**Minimal Complete Verifiable Example**:
```python
import pandas as pd
import numpy as np
import xarray as xr
t = pd.date_range(""2000"", periods=2)
x = np.arange(1000)
y = np.arange(1000)
component = np.arange(4)
idx = pd.MultiIndex.from_product([t, y, x], names=[""time"", ""y"", ""x""])
data = np.random.uniform(size=(len(idx), len(component)))
arr = xr.DataArray(
data,
coords={""pixel"": xr.DataArray(idx, name=""pixel"", dims=""pixel""),
""component"": xr.DataArray(component, name=""component"", dims=""component"")},
dims=(""pixel"", ""component"")
)
%time _ = arr.unstack()
CPU times: user 6.33 s, sys: 295 ms, total: 6.62 s
Wall time: 6.62 s
```
**Anything else we need to know?**:
For this example, >99% of the time is spent at on this line: https://github.com/pydata/xarray/blob/df7646182b17d829fe9b2199aebf649ddb2ed480/xarray/core/dataset.py#L4162, specifically on the call to `v.data` for the `pixel` array, which is a pandas MultiIndex.
Just going by the comments, it does seem like accessing `v.data` is necessary to perform the check. I'm wonder if we could make `is_duck_dask_array` a bit smarter, to avoid unnecessarily allocating data?
Alternatively, if that's too difficult, perhaps we could add a flag to `unstack` to disable those checks and just take the ""slow"" path. In my actual use-case, the slow `_unstack_full_reindex` is necessary since I have large Dask Arrays. But even then, the unstack completes in less than 3s, while I was getting OOM killed on the `v.data` checks.
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.12 | packaged by conda-forge | (default, Sep 29 2021, 19:52:28)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-1040-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.19.0
pandas: 1.3.3
numpy: 1.20.0
scipy: 1.7.1
netCDF4: 1.5.7
pydap: installed
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.9
cfgrib: 0.9.9.0
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: 2021.08.1
matplotlib: 3.4.3
cartopy: 0.20.0
seaborn: 0.11.2
numbagg: None
pint: 0.17
setuptools: 58.0.4
pip: 20.3.4
conda: None
pytest: None
IPython: 7.28.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5902/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
857301324,MDU6SXNzdWU4NTczMDEzMjQ=,5151,"DataArray.mean() emits warning with Dask, not NumPy",1312546,closed,0,,,3,2021-04-13T20:34:56Z,2021-09-15T16:41:43Z,2021-09-15T16:41:43Z,MEMBER,,,,"
**What happened**:
When calling DataArray.mean on an all-NaN dataset, a warning is emitted if and only if a Dask array is used.
**What you expected to happen**:
Identical behavior between the two, probably no warning .
**Minimal Complete Verifiable Example**:
```python
In [7]: import xarray as xr
In [8]: import numpy as np
In [9]: import dask.array as da
In [10]: import xarray as xr
In [11]: a = xr.DataArray(da.from_array(np.full((10, 10), np.nan)))
In [12]: a.mean(dim=""dim_0"").compute()
/home/taugspurger/miniconda3/envs/tmp-adlfs/lib/python3.8/site-packages/dask/array/numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
Out[12]:
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
Dimensions without coordinates: dim_1
In [13]: a.compute().mean(dim=""dim_0"")
Out[13]:
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
Dimensions without coordinates: dim_1
```
**Anything else we need to know?**:
I haven't looked closely at why this is happening (I couldn't immediately find where `.mean` is reduced). I know that Dask has had some issues in the past where NumPy warnings filters are set during *graph construction* time, but aren't set when the graph is actually computed.
**Environment**:
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.72-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.17.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.7.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.04.0
distributed: 2021.04.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 52.0.0.post20210125
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5151/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
770937642,MDU6SXNzdWU3NzA5Mzc2NDI=,4708,Potentially spurious warning in rechunk,1312546,closed,0,,,0,2020-12-18T14:37:32Z,2020-12-24T11:32:43Z,2020-12-24T11:32:43Z,MEMBER,,,,"**What happened**:
When reading an zarr dataset where the last chunk is smaller than the chunk size, users see a `UserWarning` that this may be inefficient, since the chunking differs from the chunking on disk. In general that's a good warning, but it shouldn't appear when the only difference between the on-disk chunking and the Dataset chunking is the last chunk.
**What you expected to happen**:
No warning.
**Minimal Complete Verifiable Example**:
```python
# Create and write the data
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
precipitation = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
time = pd.date_range(""2014-09-06"", periods=3)
reference_time = pd.Timestamp(""2014-09-05"")
ds = xr.Dataset(
data_vars=dict(
temperature=([""x"", ""y"", ""time""], temperature),
precipitation=([""x"", ""y"", ""time""], precipitation),
),
coords=dict(
lon=([""x"", ""y""], lon),
lat=([""x"", ""y""], lat),
time=time,
reference_time=reference_time,
),
attrs=dict(description=""Weather related data.""),
)
ds2 = ds.chunk(chunks=dict(time=(2, 1)))
ds2['temperature'].chunks
ds2.to_zarr(""/tmp/test.zarr"", mode=""w"")
```
Reading it produces a warning
```python
xr.open_zarr(""/tmp/test.zarr"")
/mnt/c/Users/taugspurger/src/xarray/xarray/core/dataset.py:408: UserWarning: Specified Dask chunks (2, 1) would separate on disks chunk shape 2 for dimension time. This could degrade performance. Consider rechunking after loading instead.
_check_chunks_compatibility(var, output_chunks, preferred_chunks)
```
**Anything else we need to know?**:
The check around https://github.com/pydata/xarray/blob/91318d2ee63149669404489be9198f230d877642/xarray/core/dataset.py#L371-L378 should probably ignore the very last chunk, since Zarr allows it to be different?
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.128-microsoft-standard
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.3.dev21+g96e1aea0
pandas: 1.1.4
numpy: 1.19.4
scipy: 1.5.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.2.dev9+dirty
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.30.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.2.4
conda: None
pytest: 5.4.3
IPython: 7.19.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4708/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
704668670,MDExOlB1bGxSZXF1ZXN0NDg5NTQ5MzIx,4438,Fixed dask.optimize on datasets,1312546,closed,0,,,3,2020-09-18T21:30:17Z,2020-09-20T05:21:58Z,2020-09-20T05:21:58Z,MEMBER,,0,pydata/xarray/pulls/4438,"Another attempt to fix #3698. The issue with my fix in is that we hit
`Variable._dask_finalize` in both `dask.optimize` and `dask.persist`. We
want to do the culling of unnecessary tasks (`test_persist_Dataset`) but
only in the persist case, not optimize (`test_optimize`).
- [x] Closes #3698
- [x] Tests added
- [x] Passes `isort . && black . && mypy . && flake8`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4438/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
703881154,MDExOlB1bGxSZXF1ZXN0NDg4OTA4MTI5,4432,Fix optimize for chunked DataArray,1312546,closed,0,,,8,2020-09-17T20:16:08Z,2020-09-18T13:20:45Z,2020-09-17T23:19:23Z,MEMBER,,0,pydata/xarray/pulls/4432,"Previously we generated in invalidate Dask task graph, becuase the lines
removed here dropped keys that were referenced elsewhere in the task
graph. The original implementation had a
comment indicating that this was to cull:
https://github.com/pydata/xarray/blob/502a988ad5b87b9f3aeec3033bf55c71272e1053/xarray/core/variable.py#L384
Just spot-checking things, I think we're OK here though. Something like
`dask.visualize(arr[[0]], optimize_graph=True)` indicates that we're OK.
- [x] Closes #3698
- [x] Tests added
- [x] Passes `isort . && black . && mypy . && flake8`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4432/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
672281867,MDExOlB1bGxSZXF1ZXN0NDYyMzQ2NzE4,4305,Fix map_blocks examples,1312546,closed,0,,,5,2020-08-03T19:06:58Z,2020-08-04T07:27:08Z,2020-08-04T03:38:51Z,MEMBER,,0,pydata/xarray/pulls/4305,"The examples on master raised with
```pytb
ValueError: Result from applying user function has unexpected coordinate variables {'month'}.
```
This PR updates the example to include the `month` coordinate. `pytest --doctest-modules` passes on these three now. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4305/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
672195744,MDExOlB1bGxSZXF1ZXN0NDYyMjc2NDEw,4303,Update map_blocks and map_overlap docstrings,1312546,closed,0,,,1,2020-08-03T16:27:45Z,2020-08-03T18:35:43Z,2020-08-03T18:06:10Z,MEMBER,,0,pydata/xarray/pulls/4303,"This reference an `obj` argument that only exists in parallel. The
object being referenced is actually `self`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4303/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
533555794,MDExOlB1bGxSZXF1ZXN0MzQ5NjA5NDM3,3598,Fix map_blocks HLG layering,1312546,closed,0,,,2,2019-12-05T19:41:23Z,2019-12-07T04:30:19Z,2019-12-07T04:30:19Z,MEMBER,,0,pydata/xarray/pulls/3598,"[x] closes #3599
This fixes an issue with the HighLevelGraph noted in
https://github.com/pydata/xarray/pull/3584, and exposed by a recent
change in Dask to do more HLG fusion.
cc @dcherian. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3598/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
400997415,MDExOlB1bGxSZXF1ZXN0MjQ2MDQ4MDcx,2693,Update asv.conf.json,1312546,closed,0,,,1,2019-01-19T13:45:51Z,2019-01-19T19:42:48Z,2019-01-19T17:45:20Z,MEMBER,,0,pydata/xarray/pulls/2693,"Is xarray 3.5+ now? Congrats, I didn't realize that.
This started failing the benchmark machine, which I was tending to last night.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2693/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
279161550,MDU6SXNzdWUyNzkxNjE1NTA=,1759,dask compute on reduction failes with ValueError,1312546,closed,0,,,17,2017-12-04T21:45:41Z,2017-12-07T22:09:18Z,2017-12-07T22:09:18Z,MEMBER,,,,"I'm doing a reduction like `mean` on a dask-backed `DataArray`, and passing it to `dask.compute`
```python
In [3]: from dask import compute
...: import numpy as np
...: import xarray as xr
...:
In [4]: data = xr.DataArray(np.random.random(size=(10, 2)),
...: dims=['samples', 'features']).chunk((5, 2))
...:
In [5]: compute(data.mean(axis=0))
```
```pytb
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 compute(data.mean(axis=0))
~/Envs/dask-dev/lib/python3.6/site-packages/dask/dask/base.py in compute(*args, **kwargs)
334 results_iter = iter(results)
335 return tuple(a if f is None else f(next(results_iter), *a)
--> 336 for f, a in postcomputes)
337
338
~/Envs/dask-dev/lib/python3.6/site-packages/dask/dask/base.py in (.0)
334 results_iter = iter(results)
335 return tuple(a if f is None else f(next(results_iter), *a)
--> 336 for f, a in postcomputes)
337
338
~/Envs/dask-dev/lib/python3.6/site-packages/xarray/xarray/core/dataarray.py in _dask_finalize(results, func, args, name)
607 @staticmethod
608 def _dask_finalize(results, func, args, name):
--> 609 ds = func(results, *args)
610 variable = ds._variables.pop(_THIS_ARRAY)
611 coords = ds._variables
~/Envs/dask-dev/lib/python3.6/site-packages/xarray/xarray/core/dataset.py in _dask_postcompute(results, info, *args)
551 func, args2 = v
552 r = results2.pop()
--> 553 result = func(r, *args2)
554 else:
555 result = v
~/Envs/dask-dev/lib/python3.6/site-packages/xarray/xarray/core/variable.py in _dask_finalize(results, array_func, array_args, dims, attrs, encoding)
389 results = {k: v for k, v in results.items() if k[0] == name} # cull
390 data = array_func(results, *array_args)
--> 391 return Variable(dims, data, attrs=attrs, encoding=encoding)
392
393 @property
~/Envs/dask-dev/lib/python3.6/site-packages/xarray/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
267 """"""
268 self._data = as_compatible_data(data, fastpath=fastpath)
--> 269 self._dims = self._parse_dimensions(dims)
270 self._attrs = None
271 self._encoding = None
~/Envs/dask-dev/lib/python3.6/site-packages/xarray/xarray/core/variable.py in _parse_dimensions(self, dims)
431 raise ValueError('dimensions %s must have the same length as the '
432 'number of data dimensions, ndim=%s'
--> 433 % (dims, self.ndim))
434 return dims
435
ValueError: dimensions ('features',) must have the same length as the number of data dimensions, ndim=0
```
The expected output is the `.compute` version, which works correctly:
```python
In [7]: data.mean(axis=0).compute()
Out[7]:
array([0.535643, 0.459406])
Dimensions without coordinates: features
```
```
In [6]: xr.show_versions()
INSTALLED VERSIONS
------------------
commit: c2b205f29467a4431baa80b5c07fe31bda67fbef
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.0-5-gc2b205f
pandas: 0.22.0.dev0+118.g4c6387520
numpy: 1.14.0.dev0+2995e6a
scipy: 1.1.0.dev0+b6fd544
netCDF4: 1.3.1
h5netcdf: None
Nio: None
bottleneck: None
cyordereddict: None
dask: 0.16.0+15.gcbc62fbef
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.1
setuptools: 36.7.2
pip: 10.0.0.dev0
conda: None
pytest: 3.2.3
IPython: 6.2.1
sphinx: 1.6.5
```
Apologies if I'm doing something silly here, I don't know xarray :)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1759/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
251773472,MDExOlB1bGxSZXF1ZXN0MTM2ODQ1MjE2,1515,Added show_commit_url to asv.conf,1312546,closed,0,,,0,2017-08-21T21:17:10Z,2017-08-23T16:01:50Z,2017-08-23T16:01:50Z,MEMBER,,0,pydata/xarray/pulls/1515,"This should setup the proper links from the published output to the commit on Github.
FYI the benchmarks should be running stably now, and posted to http://pandas.pydata.org/speed/xarray. http://pandas.pydata.org/speed/xarray/regressions.xml has an RSS feed to the regressions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1515/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull