id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1975574237,I_kwDOAMm_X851wN7d,8409,Task graphs on `.map_blocks` with many chunks can be huge,5635139,closed,0,,,6,2023-11-03T07:14:45Z,2024-01-03T04:10:16Z,2024-01-03T04:10:16Z,MEMBER,,,,"### What happened?
I'm getting task graphs > 1GB, I think possibly because the full indexes are being included in every task?
### What did you expect to happen?
Only the relevant sections of the index would be included
### Minimal Complete Verifiable Example
```Python
da = xr.tutorial.load_dataset('air_temperature')
# Dropping the index doesn't generally matter that much...
len(cloudpickle.dumps(da.chunk(lat=1, lon=1)))
# 15569320
len(cloudpickle.dumps(da.chunk().drop_vars(da.indexes)))
# 15477313
# But with `.map_blocks`, it really matters — it's really big with the indexes, and the same size without:
len(cloudpickle.dumps(da.chunk(lat=1, lon=1).map_blocks(lambda x: x)))
# 79307120
len(cloudpickle.dumps(da.chunk(lat=1, lon=1).drop_vars(da.indexes).map_blocks(lambda x: x)))
# 16016173
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.18 (main, Aug 24 2023, 21:19:58)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.6.0
cartopy: None
seaborn: 0.12.2
numbagg: 0.6.0
fsspec: 2022.8.2
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: 1.6.1
IPython: 8.14.0
sphinx: 5.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8409/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
988158051,MDU6SXNzdWU5ODgxNTgwNTE=,5764,Implement __sizeof__ on objects?,5635139,open,0,,,6,2021-09-03T23:36:53Z,2023-12-19T18:23:08Z,,MEMBER,,,,"
**Is your feature request related to a problem? Please describe.**
Currently `ds.nbytes` returns the size of the data.
But `sys.getsizeof(ds)` returns a very small number.
**Describe the solution you'd like**
If we implement `__sizeof__` on DataArrays & Datasets, this would work.
I think that would be something like `ds.nbytes` + the size of the `ds` container, + maybe attrs if those aren't handled by `.nbytes`?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5764/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue
866826033,MDU6SXNzdWU4NjY4MjYwMzM=,5215,"Add an Cumulative aggregation, similar to Rolling",5635139,closed,0,,,6,2021-04-24T19:59:49Z,2023-12-08T22:06:53Z,2023-12-08T22:06:53Z,MEMBER,,,,"
**Is your feature request related to a problem? Please describe.**
Pandas has a [`.expanding` aggregation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html), which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that.
**Describe the solution you'd like**
Basically the same as pandas — a `.expanding` method that returns an `Expanding` class, which implements the same methods as a `Rolling` class.
**Describe alternatives you've considered**
Some options:
– This
– Don't add anything, the sugar isn't worth the additional API.
– Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5215/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1878288525,PR_kwDOAMm_X85ZYos5,8139,Fix pandas' `interpolate(fill_value=)` error,5635139,closed,0,,,6,2023-09-02T02:41:45Z,2023-09-28T16:48:51Z,2023-09-04T18:05:14Z,MEMBER,,0,pydata/xarray/pulls/8139,"Pandas no longer has a `fill_value` parameter for `interpolate`.
Weirdly I wasn't getting this locally, on pandas 2.1.0, only in CI on https://github.com/pydata/xarray/actions/runs/6054400455/job/16431747966?pr=8138.
Removing it passes locally, let's see whether this works in CI
Would close #8125
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8139/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
967854972,MDExOlB1bGxSZXF1ZXN0NzEwMDA1NzY4,5694,Ask PRs to annotate tests,5635139,closed,0,,,6,2021-08-12T02:19:28Z,2023-09-28T16:46:19Z,2023-06-19T05:46:36Z,MEMBER,,0,pydata/xarray/pulls/5694,"
- [x] Passes `pre-commit run --all-files`
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
As discussed https://github.com/pydata/xarray/pull/5690#issuecomment-897280353","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5694/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1874148181,I_kwDOAMm_X85vtTtV,8123,`.rolling_exp` arguments could be clearer,5635139,open,0,,,6,2023-08-30T18:09:04Z,2023-09-01T00:25:08Z,,MEMBER,,,,"### Is your feature request related to a problem?
Currently we call `.rolling_exp` like:
```
da.rolling_exp(date=20).mean()
```
`20` refers to a ""standard"" window type — broadly ""the same average distance as a simple rolling window. That works well, and matches the `.rolling(date=20).mean()` format.
But we also have different window types, and this makes it a bit incongruent:
```
da.rolling_exp(date=0.5, window_type=""alpha"").mean()
```
...since the `window_type` is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking ""how many apples would you like to buy"", and replying ""5"", and then separately saying ""when I said 5, I meant 5 _tonnes_"".
### Describe the solution you'd like
One option would be:
```
.rolling_exp(dptr={""alpha"": 0.5})
```
We pass a dict if we want a non-standard window type — so the value is attached to its type.
We could still have the original form for `da.rolling_exp(date=20).mean()`.
### Describe alternatives you've considered
_No response_
### Additional context
(I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.)
Unless anyone disagrees, I'll try and do this soon-ish™","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8123/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
729208432,MDExOlB1bGxSZXF1ZXN0NTA5NzM0NTM2,4540,numpy_groupies,5635139,closed,0,,,6,2020-10-26T03:37:19Z,2022-02-05T22:24:12Z,2021-10-24T00:18:52Z,MEMBER,,0,pydata/xarray/pulls/4540,"
- [x] Closes https://github.com/pydata/xarray/issues/4473
- [ ] Tests added
- [x] Passes `isort . && black . && mypy . && flake8`
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
Very early effort — I found this harder than I expected — I was trying to use the existing groupby infra, but think I maybe should start afresh. The result of the `numpy_groupies` operation is a fully formed array, whereas we're used to handling an iterable of results which need to be concat.
I also added some type signature / notes and I was going through the existing code; mostly for my own understanding
If anyone has any thoughts, feel free to comment — otherwise I'll resume this soon","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4540/reactions"", ""total_count"": 4, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
399164733,MDExOlB1bGxSZXF1ZXN0MjQ0NjU3NTk5,2674,Skipping variables in datasets that don't have the core dim,5635139,closed,0,,,6,2019-01-15T02:43:11Z,2021-05-13T22:02:19Z,2021-05-13T22:02:19Z,MEMBER,,0,pydata/xarray/pulls/2674,"ref https://github.com/pydata/xarray/pull/2650#issuecomment-454164295
This seems an ugly way of accomplishing the goal; any ideas for a better way of doing this?
And stepping back, do others think a) it's helpful to skip variables in a dataset, and b) `apply_ufunc` should do this?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2674/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
298421965,MDU6SXNzdWUyOTg0MjE5NjU=,1923,Local test failure in test_backends,5635139,closed,0,,,6,2018-02-19T22:53:37Z,2020-09-05T20:32:17Z,2020-09-05T20:32:17Z,MEMBER,,,,"I'm happy to debug this further but before I do, is this an issue people have seen before? I'm running tests on master and hit an issue very early on.
FWIW I don't use netCDF, and don't think I've got that installed
#### Code Sample, a copy-pastable example if possible
```python
========================================================================== FAILURES ==========================================================================
_________________________________________________________ ScipyInMemoryDataTest.test_bytesio_pickle __________________________________________________________
self =
@pytest.mark.skipif(PY2, reason='cannot pickle BytesIO on Python 2')
def test_bytesio_pickle(self):
data = Dataset({'foo': ('x', [1, 2, 3])})
fobj = BytesIO(data.to_netcdf())
with open_dataset(fobj, autoclose=self.autoclose) as ds:
> unpickled = pickle.loads(pickle.dumps(ds))
E TypeError: can't pickle _thread.lock objects
xarray/tests/test_backends.py:1384: TypeError
```
#### Problem description
[this should explain **why** the current behavior is a problem and why the expected output is a better solution.]
#### Expected Output
Skip or pass backends tests
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: d00721a3560f57a1b9226c5dbf5bf3af0356619d
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.7.0-38-g1005a9e # not sure why this is tagged so early. I'm running on latest master
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.8.1
setuptools: 38.5.1
pip: 9.0.1
conda: None
pytest: 3.4.0
IPython: 6.2.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1923/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
575088962,MDExOlB1bGxSZXF1ZXN0MzgzMzAwMjgw,3826,Allow ellipsis to be used in stack,5635139,closed,0,,,6,2020-03-04T02:21:21Z,2020-03-20T01:20:54Z,2020-03-19T22:55:09Z,MEMBER,,0,pydata/xarray/pulls/3826,"
- [x] Closes https://github.com/pydata/xarray/issues/3814
- [x] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3826/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
577283480,MDExOlB1bGxSZXF1ZXN0Mzg1MTA3OTU4,3846,Doctests fixes,5635139,closed,0,,,6,2020-03-07T05:44:27Z,2020-03-10T14:03:05Z,2020-03-10T14:03:00Z,MEMBER,,0,pydata/xarray/pulls/3846,"
- [ ] Closes #xxxx
- [ ] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
Starting to get some fixes in.
It's going to be a long journey though. I think maybe we whitelist some files and move gradually through before whitelisting the whole library.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3846/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
485437811,MDU6SXNzdWU0ODU0Mzc4MTE=,3265,Sparse tests failing on master,5635139,closed,0,,,6,2019-08-26T20:34:21Z,2019-08-27T00:01:18Z,2019-08-27T00:01:07Z,MEMBER,,,,"https://dev.azure.com/xarray/xarray/_build/results?buildId=695
```python
=================================== FAILURES ===================================
_______________________ TestSparseVariable.test_unary_op _______________________
self =
def test_unary_op(self):
> sparse.utils.assert_eq(-self.var.data, -self.data)
E AttributeError: module 'sparse' has no attribute 'utils'
xarray/tests/test_sparse.py:285: AttributeError
___________________ TestSparseVariable.test_univariate_ufunc ___________________
self =
def test_univariate_ufunc(self):
> sparse.utils.assert_eq(np.sin(self.data), xu.sin(self.var).data)
E AttributeError: module 'sparse' has no attribute 'utils'
xarray/tests/test_sparse.py:290: AttributeError
___________________ TestSparseVariable.test_bivariate_ufunc ____________________
self =
def test_bivariate_ufunc(self):
> sparse.utils.assert_eq(np.maximum(self.data, 0), xu.maximum(self.var, 0).data)
E AttributeError: module 'sparse' has no attribute 'utils'
xarray/tests/test_sparse.py:293: AttributeError
________________________ TestSparseVariable.test_pickle ________________________
self =
def test_pickle(self):
v1 = self.var
v2 = pickle.loads(pickle.dumps(v1))
> sparse.utils.assert_eq(v1.data, v2.data)
E AttributeError: module 'sparse' has no attribute 'utils'
xarray/tests/test_sparse.py:307: AttributeError
```
Any ideas?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3265/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
457080809,MDExOlB1bGxSZXF1ZXN0Mjg4OTY1MzQ4,3029,Fix pandas-dev tests ,5635139,closed,0,,,6,2019-06-17T18:15:16Z,2019-06-28T15:31:33Z,2019-06-28T15:31:28Z,MEMBER,,0,pydata/xarray/pulls/3029,"Currently pandas-dev tests get 'stuck' on the conda install. The last instruction to run is the standard install:
```
$ if [[ ""$CONDA_ENV"" == ""docs"" ]]; then conda env create -n test_env --file doc/environment.yml; elif [[ ""$CONDA_ENV"" == ""lint"" ]]; then conda env create -n test_env --file ci/requirements-py37.yml; else conda env create -n test_env --file ci/requirements-$CONDA_ENV.yml; fi
```
And after installing the libraries, [it prints this and then stops](https://travis-ci.org/max-sixty/xarray/jobs/546491330):
```
Preparing transaction: - - done
Verifying transaction: | / \ | / - \ | / / done
Executing transaction: \ | / - \ | / - \ | / - \ | / - \ | / - \ | / / - \ | / - \ done
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
```
I'm not that familiar with conda. Anyone have any ideas as to why this would fail while the other builds would succeed?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3029/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
168901028,MDU6SXNzdWUxNjg5MDEwMjg=,934,"Should indexing be possible on 1D coords, even if not dims?",5635139,closed,0,,,6,2016-08-02T14:33:43Z,2019-01-27T06:49:52Z,2019-01-27T06:49:52Z,MEMBER,,,,"``` python
In [1]: arr = xr.DataArray(np.random.rand(4, 3),
...: ...: [('time', pd.date_range('2000-01-01', periods=4)),
...: ...: ('space', ['IA', 'IL', 'IN'])])
...: ...:
In [17]: arr.coords['space2'] = ('space', ['A','B','C'])
In [18]: arr
Out[18]:
array([[ 0.05187049, 0.04743067, 0.90329666],
[ 0.59482538, 0.71014366, 0.86588207],
[ 0.51893157, 0.49442107, 0.10697737],
[ 0.16068189, 0.60756757, 0.31935279]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
* space (space) |S2 'IA' 'IL' 'IN'
space2 (space) |S1 'A' 'B' 'C'
```
Now try to select on the space2 coord:
``` python
In [19]: arr.sel(space2='A')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 arr.sel(space2='A')
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, **indexers)
601 """"""
602 return self.isel(**indexing.remap_label_indexers(
--> 603 self, indexers, method=method, tolerance=tolerance))
604
605 def isel_points(self, dim='points', **indexers):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, **indexers)
588 DataArray.sel
589 """"""
--> 590 ds = self._to_temp_dataset().isel(**indexers)
591 return self._from_temp_dataset(ds)
592
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, **indexers)
908 invalid = [k for k in indexers if k not in self.dims]
909 if invalid:
--> 910 raise ValueError(""dimensions %r do not exist"" % invalid)
911
912 # all indexers should be int, slice or np.ndarrays
ValueError: dimensions ['space2'] do not exist
```
Is there an easier way to do this? I couldn't think of anything...
CC @justinkuosixty
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/934/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue