id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1975574237,I_kwDOAMm_X851wN7d,8409,Task graphs on `.map_blocks` with many chunks can be huge,5635139,closed,0,,,6,2023-11-03T07:14:45Z,2024-01-03T04:10:16Z,2024-01-03T04:10:16Z,MEMBER,,,,"### What happened? I'm getting task graphs > 1GB, I think possibly because the full indexes are being included in every task? ### What did you expect to happen? Only the relevant sections of the index would be included ### Minimal Complete Verifiable Example ```Python da = xr.tutorial.load_dataset('air_temperature') # Dropping the index doesn't generally matter that much... len(cloudpickle.dumps(da.chunk(lat=1, lon=1))) # 15569320 len(cloudpickle.dumps(da.chunk().drop_vars(da.indexes))) # 15477313 # But with `.map_blocks`, it really matters — it's really big with the indexes, and the same size without: len(cloudpickle.dumps(da.chunk(lat=1, lon=1).map_blocks(lambda x: x))) # 79307120 len(cloudpickle.dumps(da.chunk(lat=1, lon=1).drop_vars(da.indexes).map_blocks(lambda x: x))) # 16016173 ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.5.0 distributed: 2023.5.0 matplotlib: 3.6.0 cartopy: None seaborn: 0.12.2 numbagg: 0.6.0 fsspec: 2022.8.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.6.1 IPython: 8.14.0 sphinx: 5.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8409/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 988158051,MDU6SXNzdWU5ODgxNTgwNTE=,5764,Implement __sizeof__ on objects?,5635139,open,0,,,6,2021-09-03T23:36:53Z,2023-12-19T18:23:08Z,,MEMBER,,,," **Is your feature request related to a problem? Please describe.** Currently `ds.nbytes` returns the size of the data. But `sys.getsizeof(ds)` returns a very small number. **Describe the solution you'd like** If we implement `__sizeof__` on DataArrays & Datasets, this would work. I think that would be something like `ds.nbytes` + the size of the `ds` container, + maybe attrs if those aren't handled by `.nbytes`?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5764/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue 866826033,MDU6SXNzdWU4NjY4MjYwMzM=,5215,"Add an Cumulative aggregation, similar to Rolling",5635139,closed,0,,,6,2021-04-24T19:59:49Z,2023-12-08T22:06:53Z,2023-12-08T22:06:53Z,MEMBER,,,," **Is your feature request related to a problem? Please describe.** Pandas has a [`.expanding` aggregation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html), which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that. **Describe the solution you'd like** Basically the same as pandas — a `.expanding` method that returns an `Expanding` class, which implements the same methods as a `Rolling` class. **Describe alternatives you've considered** Some options: – This – Don't add anything, the sugar isn't worth the additional API. – Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5215/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1874148181,I_kwDOAMm_X85vtTtV,8123,`.rolling_exp` arguments could be clearer,5635139,open,0,,,6,2023-08-30T18:09:04Z,2023-09-01T00:25:08Z,,MEMBER,,,,"### Is your feature request related to a problem? Currently we call `.rolling_exp` like: ``` da.rolling_exp(date=20).mean() ``` `20` refers to a ""standard"" window type — broadly ""the same average distance as a simple rolling window. That works well, and matches the `.rolling(date=20).mean()` format. But we also have different window types, and this makes it a bit incongruent: ``` da.rolling_exp(date=0.5, window_type=""alpha"").mean() ``` ...since the `window_type` is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking ""how many apples would you like to buy"", and replying ""5"", and then separately saying ""when I said 5, I meant 5 _tonnes_"". ### Describe the solution you'd like One option would be: ``` .rolling_exp(dptr={""alpha"": 0.5}) ``` We pass a dict if we want a non-standard window type — so the value is attached to its type. We could still have the original form for `da.rolling_exp(date=20).mean()`. ### Describe alternatives you've considered _No response_ ### Additional context (I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.) Unless anyone disagrees, I'll try and do this soon-ish™","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8123/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 298421965,MDU6SXNzdWUyOTg0MjE5NjU=,1923,Local test failure in test_backends,5635139,closed,0,,,6,2018-02-19T22:53:37Z,2020-09-05T20:32:17Z,2020-09-05T20:32:17Z,MEMBER,,,,"I'm happy to debug this further but before I do, is this an issue people have seen before? I'm running tests on master and hit an issue very early on. FWIW I don't use netCDF, and don't think I've got that installed #### Code Sample, a copy-pastable example if possible ```python ========================================================================== FAILURES ========================================================================== _________________________________________________________ ScipyInMemoryDataTest.test_bytesio_pickle __________________________________________________________ self = @pytest.mark.skipif(PY2, reason='cannot pickle BytesIO on Python 2') def test_bytesio_pickle(self): data = Dataset({'foo': ('x', [1, 2, 3])}) fobj = BytesIO(data.to_netcdf()) with open_dataset(fobj, autoclose=self.autoclose) as ds: > unpickled = pickle.loads(pickle.dumps(ds)) E TypeError: can't pickle _thread.lock objects xarray/tests/test_backends.py:1384: TypeError ``` #### Problem description [this should explain **why** the current behavior is a problem and why the expected output is a better solution.] #### Expected Output Skip or pass backends tests #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: d00721a3560f57a1b9226c5dbf5bf3af0356619d python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.7.0-38-g1005a9e # not sure why this is tagged so early. I'm running on latest master pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: 3.4.0 IPython: 6.2.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1923/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 485437811,MDU6SXNzdWU0ODU0Mzc4MTE=,3265,Sparse tests failing on master,5635139,closed,0,,,6,2019-08-26T20:34:21Z,2019-08-27T00:01:18Z,2019-08-27T00:01:07Z,MEMBER,,,,"https://dev.azure.com/xarray/xarray/_build/results?buildId=695 ```python =================================== FAILURES =================================== _______________________ TestSparseVariable.test_unary_op _______________________ self = def test_unary_op(self): > sparse.utils.assert_eq(-self.var.data, -self.data) E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:285: AttributeError ___________________ TestSparseVariable.test_univariate_ufunc ___________________ self = def test_univariate_ufunc(self): > sparse.utils.assert_eq(np.sin(self.data), xu.sin(self.var).data) E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:290: AttributeError ___________________ TestSparseVariable.test_bivariate_ufunc ____________________ self = def test_bivariate_ufunc(self): > sparse.utils.assert_eq(np.maximum(self.data, 0), xu.maximum(self.var, 0).data) E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:293: AttributeError ________________________ TestSparseVariable.test_pickle ________________________ self = def test_pickle(self): v1 = self.var v2 = pickle.loads(pickle.dumps(v1)) > sparse.utils.assert_eq(v1.data, v2.data) E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:307: AttributeError ``` Any ideas?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3265/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 168901028,MDU6SXNzdWUxNjg5MDEwMjg=,934,"Should indexing be possible on 1D coords, even if not dims?",5635139,closed,0,,,6,2016-08-02T14:33:43Z,2019-01-27T06:49:52Z,2019-01-27T06:49:52Z,MEMBER,,,,"``` python In [1]: arr = xr.DataArray(np.random.rand(4, 3), ...: ...: [('time', pd.date_range('2000-01-01', periods=4)), ...: ...: ('space', ['IA', 'IL', 'IN'])]) ...: ...: In [17]: arr.coords['space2'] = ('space', ['A','B','C']) In [18]: arr Out[18]: array([[ 0.05187049, 0.04743067, 0.90329666], [ 0.59482538, 0.71014366, 0.86588207], [ 0.51893157, 0.49442107, 0.10697737], [ 0.16068189, 0.60756757, 0.31935279]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) |S2 'IA' 'IL' 'IN' space2 (space) |S1 'A' 'B' 'C' ``` Now try to select on the space2 coord: ``` python In [19]: arr.sel(space2='A') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 arr.sel(space2='A') /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, **indexers) 601 """""" 602 return self.isel(**indexing.remap_label_indexers( --> 603 self, indexers, method=method, tolerance=tolerance)) 604 605 def isel_points(self, dim='points', **indexers): /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, **indexers) 588 DataArray.sel 589 """""" --> 590 ds = self._to_temp_dataset().isel(**indexers) 591 return self._from_temp_dataset(ds) 592 /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, **indexers) 908 invalid = [k for k in indexers if k not in self.dims] 909 if invalid: --> 910 raise ValueError(""dimensions %r do not exist"" % invalid) 911 912 # all indexers should be int, slice or np.ndarrays ValueError: dimensions ['space2'] do not exist ``` Is there an easier way to do this? I couldn't think of anything... CC @justinkuosixty ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/934/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue