home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7 rows where comments = 6, type = "issue" and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: state_reason, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 5
  • open 2

type 1

  • issue · 7 ✖

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1975574237 I_kwDOAMm_X851wN7d 8409 Task graphs on `.map_blocks` with many chunks can be huge max-sixty 5635139 closed 0     6 2023-11-03T07:14:45Z 2024-01-03T04:10:16Z 2024-01-03T04:10:16Z MEMBER      

What happened?

I'm getting task graphs > 1GB, I think possibly because the full indexes are being included in every task?

What did you expect to happen?

Only the relevant sections of the index would be included

Minimal Complete Verifiable Example

```Python da = xr.tutorial.load_dataset('air_temperature')

Dropping the index doesn't generally matter that much...

len(cloudpickle.dumps(da.chunk(lat=1, lon=1)))

15569320

len(cloudpickle.dumps(da.chunk().drop_vars(da.indexes)))

15477313

But with .map_blocks, it really matters — it's really big with the indexes, and the same size without:

len(cloudpickle.dumps(da.chunk(lat=1, lon=1).map_blocks(lambda x: x)))

79307120

len(cloudpickle.dumps(da.chunk(lat=1, lon=1).drop_vars(da.indexes).map_blocks(lambda x: x)))

16016173

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.5.0 distributed: 2023.5.0 matplotlib: 3.6.0 cartopy: None seaborn: 0.12.2 numbagg: 0.6.0 fsspec: 2022.8.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.6.1 IPython: 8.14.0 sphinx: 5.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8409/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
988158051 MDU6SXNzdWU5ODgxNTgwNTE= 5764 Implement __sizeof__ on objects? max-sixty 5635139 open 0     6 2021-09-03T23:36:53Z 2023-12-19T18:23:08Z   MEMBER      

Is your feature request related to a problem? Please describe. Currently ds.nbytes returns the size of the data.

But sys.getsizeof(ds) returns a very small number.

Describe the solution you'd like If we implement __sizeof__ on DataArrays & Datasets, this would work.

I think that would be something like ds.nbytes + the size of the ds container, + maybe attrs if those aren't handled by .nbytes?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5764/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
866826033 MDU6SXNzdWU4NjY4MjYwMzM= 5215 Add an Cumulative aggregation, similar to Rolling max-sixty 5635139 closed 0     6 2021-04-24T19:59:49Z 2023-12-08T22:06:53Z 2023-12-08T22:06:53Z MEMBER      

Is your feature request related to a problem? Please describe.

Pandas has a .expanding aggregation, which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that.

Describe the solution you'd like Basically the same as pandas — a .expanding method that returns an Expanding class, which implements the same methods as a Rolling class.

Describe alternatives you've considered Some options: – This – Don't add anything, the sugar isn't worth the additional API. – Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5215/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1874148181 I_kwDOAMm_X85vtTtV 8123 `.rolling_exp` arguments could be clearer max-sixty 5635139 open 0     6 2023-08-30T18:09:04Z 2023-09-01T00:25:08Z   MEMBER      

Is your feature request related to a problem?

Currently we call .rolling_exp like:

da.rolling_exp(date=20).mean()

20 refers to a "standard" window type — broadly "the same average distance as a simple rolling window. That works well, and matches the .rolling(date=20).mean() format.

But we also have different window types, and this makes it a bit incongruent:

da.rolling_exp(date=0.5, window_type="alpha").mean()

...since the window_type is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking "how many apples would you like to buy", and replying "5", and then separately saying "when I said 5, I meant 5 tonnes".

Describe the solution you'd like

One option would be:

.rolling_exp(dptr={"alpha": 0.5})

We pass a dict if we want a non-standard window type — so the value is attached to its type.

We could still have the original form for da.rolling_exp(date=20).mean().

Describe alternatives you've considered

No response

Additional context

(I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.)

Unless anyone disagrees, I'll try and do this soon-ish™

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8123/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
298421965 MDU6SXNzdWUyOTg0MjE5NjU= 1923 Local test failure in test_backends max-sixty 5635139 closed 0     6 2018-02-19T22:53:37Z 2020-09-05T20:32:17Z 2020-09-05T20:32:17Z MEMBER      

I'm happy to debug this further but before I do, is this an issue people have seen before? I'm running tests on master and hit an issue very early on.

FWIW I don't use netCDF, and don't think I've got that installed

Code Sample, a copy-pastable example if possible

```python ========================================================================== FAILURES ========================================================================== _________ ScipyInMemoryDataTest.test_bytesio_pickle __________

self = <xarray.tests.test_backends.ScipyInMemoryDataTest testMethod=test_bytesio_pickle>

@pytest.mark.skipif(PY2, reason='cannot pickle BytesIO on Python 2')
def test_bytesio_pickle(self):
    data = Dataset({'foo': ('x', [1, 2, 3])})
    fobj = BytesIO(data.to_netcdf())
    with open_dataset(fobj, autoclose=self.autoclose) as ds:
      unpickled = pickle.loads(pickle.dumps(ds))

E TypeError: can't pickle _thread.lock objects

xarray/tests/test_backends.py:1384: TypeError ```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

Skip or pass backends tests

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: d00721a3560f57a1b9226c5dbf5bf3af0356619d python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.7.0-38-g1005a9e # not sure why this is tagged so early. I'm running on latest master pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: 3.4.0 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1923/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
485437811 MDU6SXNzdWU0ODU0Mzc4MTE= 3265 Sparse tests failing on master max-sixty 5635139 closed 0     6 2019-08-26T20:34:21Z 2019-08-27T00:01:18Z 2019-08-27T00:01:07Z MEMBER      

https://dev.azure.com/xarray/xarray/_build/results?buildId=695

```python

=================================== FAILURES =================================== ___ TestSparseVariable.test_unary_op ___

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f0b21b70>

def test_unary_op(self):
  sparse.utils.assert_eq(-self.var.data, -self.data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:285: AttributeError ___ TestSparseVariable.test_univariate_ufunc _____

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24ebc2bb38>

def test_univariate_ufunc(self):
  sparse.utils.assert_eq(np.sin(self.data), xu.sin(self.var).data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:290: AttributeError ___ TestSparseVariable.test_bivariate_ufunc ______

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f02a7e10>

def test_bivariate_ufunc(self):
  sparse.utils.assert_eq(np.maximum(self.data, 0), xu.maximum(self.var, 0).data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:293: AttributeError ___ TestSparseVariable.testpickle ____

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f04f2c50>

def test_pickle(self):
    v1 = self.var
    v2 = pickle.loads(pickle.dumps(v1))
  sparse.utils.assert_eq(v1.data, v2.data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:307: AttributeError ```

Any ideas?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3265/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
168901028 MDU6SXNzdWUxNjg5MDEwMjg= 934 Should indexing be possible on 1D coords, even if not dims? max-sixty 5635139 closed 0     6 2016-08-02T14:33:43Z 2019-01-27T06:49:52Z 2019-01-27T06:49:52Z MEMBER      

``` python In [1]: arr = xr.DataArray(np.random.rand(4, 3), ...: ...: [('time', pd.date_range('2000-01-01', periods=4)), ...: ...: ('space', ['IA', 'IL', 'IN'])]) ...: ...:

In [17]: arr.coords['space2'] = ('space', ['A','B','C'])

In [18]: arr Out[18]: <xarray.DataArray (time: 4, space: 3)> array([[ 0.05187049, 0.04743067, 0.90329666], [ 0.59482538, 0.71014366, 0.86588207], [ 0.51893157, 0.49442107, 0.10697737], [ 0.16068189, 0.60756757, 0.31935279]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) |S2 'IA' 'IL' 'IN' space2 (space) |S1 'A' 'B' 'C' ```

Now try to select on the space2 coord:

``` python In [19]: arr.sel(space2='A')


ValueError Traceback (most recent call last) <ipython-input-19-eae5e4b64758> in <module>() ----> 1 arr.sel(space2='A')

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, indexers) 601 """ 602 return self.isel(indexing.remap_label_indexers( --> 603 self, indexers, method=method, tolerance=tolerance)) 604 605 def isel_points(self, dim='points', **indexers):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, indexers) 588 DataArray.sel 589 """ --> 590 ds = self._to_temp_dataset().isel(indexers) 591 return self._from_temp_dataset(ds) 592

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, **indexers) 908 invalid = [k for k in indexers if k not in self.dims] 909 if invalid: --> 910 raise ValueError("dimensions %r do not exist" % invalid) 911 912 # all indexers should be int, slice or np.ndarrays

ValueError: dimensions ['space2'] do not exist ```

Is there an easier way to do this? I couldn't think of anything...

CC @justinkuosixty

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/934/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 50.378ms · About: xarray-datasette