github: issues: 7 rows where comments = 6, type = "issue" and user = 5635139 sorted by updated

7 rows where comments = 6, type = "issue" and user = 5635139 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1975574237	I_kwDOAMm_X851wN7d	8409	Task graphs on `.map_blocks` with many chunks can be huge	max-sixty 5635139	closed	6	2023-11-03T07:14:45Z	2024-01-03T04:10:16Z	2024-01-03T04:10:16Z	MEMBER	What happened? I'm getting task graphs > 1GB, I think possibly because the full indexes are being included in every task? What did you expect to happen? Only the relevant sections of the index would be included Minimal Complete Verifiable Example ```Python da = xr.tutorial.load_dataset('air_temperature') Dropping the index doesn't generally matter that much... len(cloudpickle.dumps(da.chunk(lat=1, lon=1))) 15569320 len(cloudpickle.dumps(da.chunk().drop_vars(da.indexes))) 15477313 But with `.map_blocks`, it really matters — it's really big with the indexes, and the same size without: len(cloudpickle.dumps(da.chunk(lat=1, lon=1).map_blocks(lambda x: x))) 79307120 len(cloudpickle.dumps(da.chunk(lat=1, lon=1).drop_vars(da.indexes).map_blocks(lambda x: x))) 16016173 ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.5.0 distributed: 2023.5.0 matplotlib: 3.6.0 cartopy: None seaborn: 0.12.2 numbagg: 0.6.0 fsspec: 2022.8.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.6.1 IPython: 8.14.0 sphinx: 5.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8409/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
988158051	MDU6SXNzdWU5ODgxNTgwNTE=	5764	Implement __sizeof__ on objects?	max-sixty 5635139	open	6	2021-09-03T23:36:53Z	2023-12-19T18:23:08Z		MEMBER	Is your feature request related to a problem? Please describe. Currently `ds.nbytes` returns the size of the data. But `sys.getsizeof(ds)` returns a very small number. Describe the solution you'd like If we implement `__sizeof__` on DataArrays & Datasets, this would work. I think that would be something like `ds.nbytes` + the size of the `ds` container, + maybe attrs if those aren't handled by `.nbytes`?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5764/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	reopened	xarray 13221727	issue
866826033	MDU6SXNzdWU4NjY4MjYwMzM=	5215	Add an Cumulative aggregation, similar to Rolling	max-sixty 5635139	closed	6	2021-04-24T19:59:49Z	2023-12-08T22:06:53Z	2023-12-08T22:06:53Z	MEMBER	Is your feature request related to a problem? Please describe. Pandas has a `.expanding` aggregation, which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that. Describe the solution you'd like Basically the same as pandas — a `.expanding` method that returns an `Expanding` class, which implements the same methods as a `Rolling` class. Describe alternatives you've considered Some options: – This – Don't add anything, the sugar isn't worth the additional API. – Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5215/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1874148181	I_kwDOAMm_X85vtTtV	8123	`.rolling_exp` arguments could be clearer	max-sixty 5635139	open	6	2023-08-30T18:09:04Z	2023-09-01T00:25:08Z		MEMBER	Is your feature request related to a problem? Currently we call `.rolling_exp` like: `da.rolling_exp(date=20).mean()` `20` refers to a "standard" window type — broadly "the same average distance as a simple rolling window. That works well, and matches the `.rolling(date=20).mean()` format. But we also have different window types, and this makes it a bit incongruent: `da.rolling_exp(date=0.5, window_type="alpha").mean()` ...since the `window_type` is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking "how many apples would you like to buy", and replying "5", and then separately saying "when I said 5, I meant 5 tonnes". Describe the solution you'd like One option would be: `.rolling_exp(dptr={"alpha": 0.5})` We pass a dict if we want a non-standard window type — so the value is attached to its type. We could still have the original form for `da.rolling_exp(date=20).mean()`. Describe alternatives you've considered No response Additional context (I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.) Unless anyone disagrees, I'll try and do this soon-ish™	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8123/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
298421965	MDU6SXNzdWUyOTg0MjE5NjU=	1923	Local test failure in test_backends	max-sixty 5635139	closed	6	2018-02-19T22:53:37Z	2020-09-05T20:32:17Z	2020-09-05T20:32:17Z	MEMBER	I'm happy to debug this further but before I do, is this an issue people have seen before? I'm running tests on master and hit an issue very early on. FWIW I don't use netCDF, and don't think I've got that installed Code Sample, a copy-pastable example if possible ```python ========================================================================== FAILURES ========================================================================== _________ ScipyInMemoryDataTest.test_bytesio_pickle __________ self = <xarray.tests.test_backends.ScipyInMemoryDataTest testMethod=test_bytesio_pickle> `@pytest.mark.skipif(PY2, reason='cannot pickle BytesIO on Python 2') def test_bytesio_pickle(self): data = Dataset({'foo': ('x', [1, 2, 3])}) fobj = BytesIO(data.to_netcdf()) with open_dataset(fobj, autoclose=self.autoclose) as ds:` `unpickled = pickle.loads(pickle.dumps(ds))` E TypeError: can't pickle _thread.lock objects xarray/tests/test_backends.py:1384: TypeError ``` Problem description [this should explain why the current behavior is a problem and why the expected output is a better solution.] Expected Output Skip or pass backends tests Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: d00721a3560f57a1b9226c5dbf5bf3af0356619d python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.7.0-38-g1005a9e # not sure why this is tagged so early. I'm running on latest master pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: 3.4.0 IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1923/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
485437811	MDU6SXNzdWU0ODU0Mzc4MTE=	3265	Sparse tests failing on master	max-sixty 5635139	closed	6	2019-08-26T20:34:21Z	2019-08-27T00:01:18Z	2019-08-27T00:01:07Z	MEMBER	https://dev.azure.com/xarray/xarray/_build/results?buildId=695 ```python =================================== FAILURES =================================== ___ TestSparseVariable.test_unary_op ___ self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f0b21b70> `def test_unary_op(self):` `sparse.utils.assert_eq(-self.var.data, -self.data)` E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:285: AttributeError ___ TestSparseVariable.test_univariate_ufunc _____ self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24ebc2bb38> `def test_univariate_ufunc(self):` `sparse.utils.assert_eq(np.sin(self.data), xu.sin(self.var).data)` E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:290: AttributeError ___ TestSparseVariable.test_bivariate_ufunc ______ self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f02a7e10> `def test_bivariate_ufunc(self):` `sparse.utils.assert_eq(np.maximum(self.data, 0), xu.maximum(self.var, 0).data)` E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:293: AttributeError ___ TestSparseVariable.testpickle ____ self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f04f2c50> `def test_pickle(self): v1 = self.var v2 = pickle.loads(pickle.dumps(v1))` `sparse.utils.assert_eq(v1.data, v2.data)` E AttributeError: module 'sparse' has no attribute 'utils' xarray/tests/test_sparse.py:307: AttributeError ``` Any ideas?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3265/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
168901028	MDU6SXNzdWUxNjg5MDEwMjg=	934	Should indexing be possible on 1D coords, even if not dims?	max-sixty 5635139	closed	6	2016-08-02T14:33:43Z	2019-01-27T06:49:52Z	2019-01-27T06:49:52Z	MEMBER	``` python In [1]: arr = xr.DataArray(np.random.rand(4, 3), ...: ...: [('time', pd.date_range('2000-01-01', periods=4)), ...: ...: ('space', ['IA', 'IL', 'IN'])]) ...: ...: In [17]: arr.coords['space2'] = ('space', ['A','B','C']) In [18]: arr Out[18]: <xarray.DataArray (time: 4, space: 3)> array([[ 0.05187049, 0.04743067, 0.90329666], [ 0.59482538, 0.71014366, 0.86588207], [ 0.51893157, 0.49442107, 0.10697737], [ 0.16068189, 0.60756757, 0.31935279]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) \|S2 'IA' 'IL' 'IN' space2 (space) \|S1 'A' 'B' 'C' ``` Now try to select on the space2 coord: ``` python In [19]: arr.sel(space2='A') ValueError Traceback (most recent call last) <ipython-input-19-eae5e4b64758> in <module>() ----> 1 arr.sel(space2='A') /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, indexers) 601 """ 602 return self.isel(indexing.remap_label_indexers( --> 603 self, indexers, method=method, tolerance=tolerance)) 604 605 def isel_points(self, dim='points', indexers): /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, indexers) 588 DataArray.sel 589 """ --> 590 ds = self._to_temp_dataset().isel(indexers) 591 return self._from_temp_dataset(ds) 592 /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, indexers) 908 invalid = [k for k in indexers if k not in self.dims] 909 if invalid: --> 910 raise ValueError("dimensions %r do not exist" % invalid) 911 912 # all indexers should be int, slice or np.ndarrays ValueError: dimensions ['space2'] do not exist ``` Is there an easier way to do this? I couldn't think of anything... CC @justinkuosixty	{ "url": "https://api.github.com/repos/pydata/xarray/issues/934/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

7 rows where comments = 6, type = "issue" and user = 5635139 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Dropping the index doesn't generally matter that much...

15569320

15477313

But with `.map_blocks`, it really matters — it's really big with the indexes, and the same size without:

79307120

16016173

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `xr.show_versions()`

Advanced export

issues

7 rows where comments = 6, type = "issue" and user = 5635139 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Dropping the index doesn't generally matter that much...

15569320

15477313

But with .map_blocks, it really matters — it's really big with the indexes, and the same size without:

79307120

16016173

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

Advanced export

But with `.map_blocks`, it really matters — it's really big with the indexes, and the same size without:

Output of `xr.show_versions()`