home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

14 rows where comments = 6, repo = 13221727 and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: state_reason, created_at (date), updated_at (date), closed_at (date)

These facets timed out: type

state 2

  • closed 12
  • open 2

repo 1

  • xarray · 14 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1975574237 I_kwDOAMm_X851wN7d 8409 Task graphs on `.map_blocks` with many chunks can be huge max-sixty 5635139 closed 0     6 2023-11-03T07:14:45Z 2024-01-03T04:10:16Z 2024-01-03T04:10:16Z MEMBER      

What happened?

I'm getting task graphs > 1GB, I think possibly because the full indexes are being included in every task?

What did you expect to happen?

Only the relevant sections of the index would be included

Minimal Complete Verifiable Example

```Python da = xr.tutorial.load_dataset('air_temperature')

Dropping the index doesn't generally matter that much...

len(cloudpickle.dumps(da.chunk(lat=1, lon=1)))

15569320

len(cloudpickle.dumps(da.chunk().drop_vars(da.indexes)))

15477313

But with .map_blocks, it really matters — it's really big with the indexes, and the same size without:

len(cloudpickle.dumps(da.chunk(lat=1, lon=1).map_blocks(lambda x: x)))

79307120

len(cloudpickle.dumps(da.chunk(lat=1, lon=1).drop_vars(da.indexes).map_blocks(lambda x: x)))

16016173

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.5.0 distributed: 2023.5.0 matplotlib: 3.6.0 cartopy: None seaborn: 0.12.2 numbagg: 0.6.0 fsspec: 2022.8.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.6.1 IPython: 8.14.0 sphinx: 5.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8409/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
988158051 MDU6SXNzdWU5ODgxNTgwNTE= 5764 Implement __sizeof__ on objects? max-sixty 5635139 open 0     6 2021-09-03T23:36:53Z 2023-12-19T18:23:08Z   MEMBER      

Is your feature request related to a problem? Please describe. Currently ds.nbytes returns the size of the data.

But sys.getsizeof(ds) returns a very small number.

Describe the solution you'd like If we implement __sizeof__ on DataArrays & Datasets, this would work.

I think that would be something like ds.nbytes + the size of the ds container, + maybe attrs if those aren't handled by .nbytes?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5764/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
866826033 MDU6SXNzdWU4NjY4MjYwMzM= 5215 Add an Cumulative aggregation, similar to Rolling max-sixty 5635139 closed 0     6 2021-04-24T19:59:49Z 2023-12-08T22:06:53Z 2023-12-08T22:06:53Z MEMBER      

Is your feature request related to a problem? Please describe.

Pandas has a .expanding aggregation, which is basically rolling with a full lookback. I often end up supplying rolling with the length of the dimension, and this is some nice sugar for that.

Describe the solution you'd like Basically the same as pandas — a .expanding method that returns an Expanding class, which implements the same methods as a Rolling class.

Describe alternatives you've considered Some options: – This – Don't add anything, the sugar isn't worth the additional API. – Go full out and write specialized expanding algos — which will be faster since they don't have to keep track of the window. But not that much faster, likely not worth the effort.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5215/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1878288525 PR_kwDOAMm_X85ZYos5 8139 Fix pandas' `interpolate(fill_value=)` error max-sixty 5635139 closed 0     6 2023-09-02T02:41:45Z 2023-09-28T16:48:51Z 2023-09-04T18:05:14Z MEMBER   0 pydata/xarray/pulls/8139

Pandas no longer has a fill_value parameter for interpolate.

Weirdly I wasn't getting this locally, on pandas 2.1.0, only in CI on https://github.com/pydata/xarray/actions/runs/6054400455/job/16431747966?pr=8138.

Removing it passes locally, let's see whether this works in CI

Would close #8125

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
967854972 MDExOlB1bGxSZXF1ZXN0NzEwMDA1NzY4 5694 Ask PRs to annotate tests max-sixty 5635139 closed 0     6 2021-08-12T02:19:28Z 2023-09-28T16:46:19Z 2023-06-19T05:46:36Z MEMBER   0 pydata/xarray/pulls/5694
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

As discussed https://github.com/pydata/xarray/pull/5690#issuecomment-897280353

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5694/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1874148181 I_kwDOAMm_X85vtTtV 8123 `.rolling_exp` arguments could be clearer max-sixty 5635139 open 0     6 2023-08-30T18:09:04Z 2023-09-01T00:25:08Z   MEMBER      

Is your feature request related to a problem?

Currently we call .rolling_exp like:

da.rolling_exp(date=20).mean()

20 refers to a "standard" window type — broadly "the same average distance as a simple rolling window. That works well, and matches the .rolling(date=20).mean() format.

But we also have different window types, and this makes it a bit incongruent:

da.rolling_exp(date=0.5, window_type="alpha").mean()

...since the window_type is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking "how many apples would you like to buy", and replying "5", and then separately saying "when I said 5, I meant 5 tonnes".

Describe the solution you'd like

One option would be:

.rolling_exp(dptr={"alpha": 0.5})

We pass a dict if we want a non-standard window type — so the value is attached to its type.

We could still have the original form for da.rolling_exp(date=20).mean().

Describe alternatives you've considered

No response

Additional context

(I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.)

Unless anyone disagrees, I'll try and do this soon-ish™

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8123/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
729208432 MDExOlB1bGxSZXF1ZXN0NTA5NzM0NTM2 4540 numpy_groupies max-sixty 5635139 closed 0     6 2020-10-26T03:37:19Z 2022-02-05T22:24:12Z 2021-10-24T00:18:52Z MEMBER   0 pydata/xarray/pulls/4540
  • [x] Closes https://github.com/pydata/xarray/issues/4473
  • [ ] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Very early effort — I found this harder than I expected — I was trying to use the existing groupby infra, but think I maybe should start afresh. The result of the numpy_groupies operation is a fully formed array, whereas we're used to handling an iterable of results which need to be concat.

I also added some type signature / notes and I was going through the existing code; mostly for my own understanding

If anyone has any thoughts, feel free to comment — otherwise I'll resume this soon

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4540/reactions",
    "total_count": 4,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
399164733 MDExOlB1bGxSZXF1ZXN0MjQ0NjU3NTk5 2674 Skipping variables in datasets that don't have the core dim max-sixty 5635139 closed 0     6 2019-01-15T02:43:11Z 2021-05-13T22:02:19Z 2021-05-13T22:02:19Z MEMBER   0 pydata/xarray/pulls/2674

ref https://github.com/pydata/xarray/pull/2650#issuecomment-454164295

This seems an ugly way of accomplishing the goal; any ideas for a better way of doing this?

And stepping back, do others think a) it's helpful to skip variables in a dataset, and b) apply_ufunc should do this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2674/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
298421965 MDU6SXNzdWUyOTg0MjE5NjU= 1923 Local test failure in test_backends max-sixty 5635139 closed 0     6 2018-02-19T22:53:37Z 2020-09-05T20:32:17Z 2020-09-05T20:32:17Z MEMBER      

I'm happy to debug this further but before I do, is this an issue people have seen before? I'm running tests on master and hit an issue very early on.

FWIW I don't use netCDF, and don't think I've got that installed

Code Sample, a copy-pastable example if possible

```python ========================================================================== FAILURES ========================================================================== _________ ScipyInMemoryDataTest.test_bytesio_pickle __________

self = <xarray.tests.test_backends.ScipyInMemoryDataTest testMethod=test_bytesio_pickle>

@pytest.mark.skipif(PY2, reason='cannot pickle BytesIO on Python 2')
def test_bytesio_pickle(self):
    data = Dataset({'foo': ('x', [1, 2, 3])})
    fobj = BytesIO(data.to_netcdf())
    with open_dataset(fobj, autoclose=self.autoclose) as ds:
      unpickled = pickle.loads(pickle.dumps(ds))

E TypeError: can't pickle _thread.lock objects

xarray/tests/test_backends.py:1384: TypeError ```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

Skip or pass backends tests

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: d00721a3560f57a1b9226c5dbf5bf3af0356619d python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.7.0-38-g1005a9e # not sure why this is tagged so early. I'm running on latest master pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: 3.4.0 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1923/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
575088962 MDExOlB1bGxSZXF1ZXN0MzgzMzAwMjgw 3826 Allow ellipsis to be used in stack max-sixty 5635139 closed 0     6 2020-03-04T02:21:21Z 2020-03-20T01:20:54Z 2020-03-19T22:55:09Z MEMBER   0 pydata/xarray/pulls/3826
  • [x] Closes https://github.com/pydata/xarray/issues/3814
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3826/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
577283480 MDExOlB1bGxSZXF1ZXN0Mzg1MTA3OTU4 3846 Doctests fixes max-sixty 5635139 closed 0     6 2020-03-07T05:44:27Z 2020-03-10T14:03:05Z 2020-03-10T14:03:00Z MEMBER   0 pydata/xarray/pulls/3846
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

Starting to get some fixes in.

It's going to be a long journey though. I think maybe we whitelist some files and move gradually through before whitelisting the whole library.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3846/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
485437811 MDU6SXNzdWU0ODU0Mzc4MTE= 3265 Sparse tests failing on master max-sixty 5635139 closed 0     6 2019-08-26T20:34:21Z 2019-08-27T00:01:18Z 2019-08-27T00:01:07Z MEMBER      

https://dev.azure.com/xarray/xarray/_build/results?buildId=695

```python

=================================== FAILURES =================================== ___ TestSparseVariable.test_unary_op ___

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f0b21b70>

def test_unary_op(self):
  sparse.utils.assert_eq(-self.var.data, -self.data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:285: AttributeError ___ TestSparseVariable.test_univariate_ufunc _____

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24ebc2bb38>

def test_univariate_ufunc(self):
  sparse.utils.assert_eq(np.sin(self.data), xu.sin(self.var).data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:290: AttributeError ___ TestSparseVariable.test_bivariate_ufunc ______

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f02a7e10>

def test_bivariate_ufunc(self):
  sparse.utils.assert_eq(np.maximum(self.data, 0), xu.maximum(self.var, 0).data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:293: AttributeError ___ TestSparseVariable.testpickle ____

self = <xarray.tests.test_sparse.TestSparseVariable object at 0x7f24f04f2c50>

def test_pickle(self):
    v1 = self.var
    v2 = pickle.loads(pickle.dumps(v1))
  sparse.utils.assert_eq(v1.data, v2.data)

E AttributeError: module 'sparse' has no attribute 'utils'

xarray/tests/test_sparse.py:307: AttributeError ```

Any ideas?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3265/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
457080809 MDExOlB1bGxSZXF1ZXN0Mjg4OTY1MzQ4 3029 Fix pandas-dev tests max-sixty 5635139 closed 0     6 2019-06-17T18:15:16Z 2019-06-28T15:31:33Z 2019-06-28T15:31:28Z MEMBER   0 pydata/xarray/pulls/3029

Currently pandas-dev tests get 'stuck' on the conda install. The last instruction to run is the standard install:

$ if [[ "$CONDA_ENV" == "docs" ]]; then conda env create -n test_env --file doc/environment.yml; elif [[ "$CONDA_ENV" == "lint" ]]; then conda env create -n test_env --file ci/requirements-py37.yml; else conda env create -n test_env --file ci/requirements-$CONDA_ENV.yml; fi

And after installing the libraries, it prints this and then stops:

Preparing transaction: - - done Verifying transaction: | / \ | / - \ | / / done Executing transaction: \ | / - \ | / - \ | / - \ | / - \ | / - \ | / / - \ | / - \ done No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

I'm not that familiar with conda. Anyone have any ideas as to why this would fail while the other builds would succeed?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3029/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
168901028 MDU6SXNzdWUxNjg5MDEwMjg= 934 Should indexing be possible on 1D coords, even if not dims? max-sixty 5635139 closed 0     6 2016-08-02T14:33:43Z 2019-01-27T06:49:52Z 2019-01-27T06:49:52Z MEMBER      

``` python In [1]: arr = xr.DataArray(np.random.rand(4, 3), ...: ...: [('time', pd.date_range('2000-01-01', periods=4)), ...: ...: ('space', ['IA', 'IL', 'IN'])]) ...: ...:

In [17]: arr.coords['space2'] = ('space', ['A','B','C'])

In [18]: arr Out[18]: <xarray.DataArray (time: 4, space: 3)> array([[ 0.05187049, 0.04743067, 0.90329666], [ 0.59482538, 0.71014366, 0.86588207], [ 0.51893157, 0.49442107, 0.10697737], [ 0.16068189, 0.60756757, 0.31935279]]) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) |S2 'IA' 'IL' 'IN' space2 (space) |S1 'A' 'B' 'C' ```

Now try to select on the space2 coord:

``` python In [19]: arr.sel(space2='A')


ValueError Traceback (most recent call last) <ipython-input-19-eae5e4b64758> in <module>() ----> 1 arr.sel(space2='A')

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, indexers) 601 """ 602 return self.isel(indexing.remap_label_indexers( --> 603 self, indexers, method=method, tolerance=tolerance)) 604 605 def isel_points(self, dim='points', **indexers):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, indexers) 588 DataArray.sel 589 """ --> 590 ds = self._to_temp_dataset().isel(indexers) 591 return self._from_temp_dataset(ds) 592

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, **indexers) 908 invalid = [k for k in indexers if k not in self.dims] 909 if invalid: --> 910 raise ValueError("dimensions %r do not exist" % invalid) 911 912 # all indexers should be int, slice or np.ndarrays

ValueError: dimensions ['space2'] do not exist ```

Is there an easier way to do this? I couldn't think of anything...

CC @justinkuosixty

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/934/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2089.531ms · About: xarray-datasette