home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

70 rows where state = "open" and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, draft, created_at (date), updated_at (date)

type 2

  • issue 56
  • pull 14

state 1

  • open · 70 ✖

repo 1

  • xarray 70
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2278499376 PR_kwDOAMm_X85uhFke 8997 Zarr: Optimize `region="auto"` detection dcherian 2448579 open 0     1 2024-05-03T22:13:18Z 2024-05-04T21:47:39Z   MEMBER   0 pydata/xarray/pulls/8997
  1. This moves the region detection code into ZarrStore so we only open the store once.
  2. Instead of opening the store as a dataset, construct a pd.Index directly to "auto"-infer the region.

The diff is large mostly because a bunch of code moved from backends/api.py to backends/zarr.py

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8997/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2278510478 PR_kwDOAMm_X85uhIGP 8998 Zarr: Optimize appending dcherian 2448579 open 0     0 2024-05-03T22:21:44Z 2024-05-03T22:23:34Z   MEMBER   1 pydata/xarray/pulls/8998

Builds on #8997

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8998/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1915997507 I_kwDOAMm_X85yM81D 8238 NamedArray tracking issue dcherian 2448579 open 0     12 2023-09-27T17:07:58Z 2024-04-30T12:49:17Z   MEMBER      

@andersy005 I think it would be good to keep a running list of NamedArray tasks. I'll start with a rough sketch, please update/edit as you like.

  • [x] Refactor out NamedArray base class (#8075)
  • [x] publicize design doc: Scientific Python | Pangeo | NumPy Mailist
  • [ ] Migrate VariableArithmetic to NamedArrayArithmetic (#8244)
  • [ ] Migrate ExplicitlyIndexed array classes to array protocols
  • [x] MIgrate from *Indexer objects to .oindex and .vindex on ExplicitlyIndexed array classes
  • [ ] https://github.com/pydata/xarray/pull/8870
  • [ ] Migrate unary ops
  • [ ] Migrate binary ops
  • [ ] Migrate nanops.py
  • [x] Avoid "injecting" reduce methods potentially by using generate_reductions.py? (#8304)
  • [ ] reprs and formatting.py
  • [x] parallelcompat.py
  • [ ] pycompat.py (#8244)
  • [ ] https://github.com/pydata/xarray/pull/8276
  • [ ] have test_variable.py test both NamedArray and Variable
  • [x] Arrays with unknown shape #8291
  • [ ] https://github.com/pydata/xarray/issues/8306
  • [ ] https://github.com/pydata/xarray/issues/8310
  • [ ] https://github.com/pydata/xarray/issues/8333
  • [ ] Try to preserve imports from xarray.core/* by importing namedarray functionality into xarray.core/*

xref #3981

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8238/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2259316341 I_kwDOAMm_X86Gqm51 8965 Support concurrent loading of variables dcherian 2448579 open 0     4 2024-04-23T16:41:24Z 2024-04-29T22:21:51Z   MEMBER      

Is your feature request related to a problem?

Today if users have to concurrently load multiple variables in a DataArray or Dataset, they have to use dask.

It struck me that it'd be pretty easy for .load to gain an executor kwarg that accepts anything that follows the concurrent.futures executor interface, and parallelize this loop.

https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8965/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2187743087 PR_kwDOAMm_X85ptH1f 8840 Grouper, Resampler as public api dcherian 2448579 open 0     0 2024-03-15T05:16:05Z 2024-04-21T16:21:34Z   MEMBER   1 pydata/xarray/pulls/8840

Expose Grouper and Resampler as public API

TODO: - [ ] Consider avoiding IndexVariable


  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8840/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2248614324 I_kwDOAMm_X86GByG0 8952 `isel(multi_index_level_name = MultiIndex.level)` corrupts the MultiIndex dcherian 2448579 open 0     1 2024-04-17T15:41:39Z 2024-04-18T13:14:46Z   MEMBER      

What happened?

From https://github.com/pydata/xarray/discussions/8951

if d is a MultiIndex-ed dataset with levels (x, y, z), and m is a dataset with a single coord x m.isel(x=d.x) builds a dataset with a MultiIndex with levels (y, z). This seems like it should work.

cc @benbovy

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import pandas as pd, xarray as xr, numpy as np

xr.set_options(use_flox=True)

test = pd.DataFrame() test["x"] = np.arange(100) % 10 test["y"] = np.arange(100) test["z"] = np.arange(100) test["v"] = np.arange(100)

d = xr.Dataset.from_dataframe(test) d = d.set_index(index = ["x", "y", "z"]) print(d)

m = d.groupby("x").mean() print(m)

print(d.xindexes) print(m.isel(x=d.x).xindexes)

xr.align(d, m.isel(x=d.x))

res = d.groupby("x") - m

print(res)

```

<xarray.Dataset> Dimensions: (index: 100) Coordinates: * index (index) object MultiIndex * x (index) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 8 9 0 1 2 3 4 5 6 7 8 9 * y (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99 * z (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99 Data variables: v (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99 <xarray.Dataset> Dimensions: (x: 10) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 Data variables: v (x) float64 45.0 46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0 Indexes: ┌ index PandasMultiIndex │ x │ y └ z Indexes: ┌ index PandasMultiIndex │ y └ z ValueError...

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8952/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2215762637 PR_kwDOAMm_X85rMHpN 8893 Avoid extra read from disk when creating Pandas Index. dcherian 2448579 open 0     1 2024-03-29T17:44:52Z 2024-04-08T18:55:09Z   MEMBER   0 pydata/xarray/pulls/8893
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8893/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2228319306 I_kwDOAMm_X86E0XRK 8914 swap_dims does not propagate indexes properly dcherian 2448579 open 0     0 2024-04-05T15:36:26Z 2024-04-05T15:36:27Z   MEMBER      

What happened?

Found by hypothesis ``` import xarray as xr import numpy as np

var = xr.Variable(dims="2", data=np.array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000001'], dtype='datetime64[ns]')) var1 = xr.Variable(data=np.array([0], dtype=np.uint32), dims=['1'], attrs={})

state = xr.Dataset() state['2'] = var state = state.stack({"0": ["2"]}) state['1'] = var1 state['1_'] = var1#.copy(deep=True) state = state.swap_dims({"1": "1_"}) xr.testing.assertions._assert_internal_invariants(state, False) ```

This swaps simple pandas indexed dims, but the multi-index that is in the dataset and not affected by the swap_dims op ends up broken.

cc @benbovy

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8914/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2224297504 PR_kwDOAMm_X85rpGUH 8906 Add invariant check for IndexVariable.name dcherian 2448579 open 0     1 2024-04-04T02:13:33Z 2024-04-05T07:12:54Z   MEMBER   1 pydata/xarray/pulls/8906

@benbovy this seems to be the root cause of #8646, the variable name in Dataset._variables does not match IndexVariable.name.

A good number of tests seem to fail though, so not sure if this is a good chck.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8906/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1997636679 PR_kwDOAMm_X85frAC_ 8460 Add initialize_zarr dcherian 2448579 open 0     8 2023-11-16T19:45:05Z 2024-04-02T15:08:01Z   MEMBER   1 pydata/xarray/pulls/8460
  • [x] Closes #8343
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

The intended pattern is: ```python

after_init = initialize_zarr(store, ds, region_dims=("x",))
for i in range(ds.sizes["x"]):
    after_init.isel(x=[i]).to_zarr(store, region={"x": slice(i, i + 1)})

```

cc @slevang

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8460/reactions",
    "total_count": 5,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 3,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 2
}
    xarray 13221727 pull
2213636579 I_kwDOAMm_X86D8Wnj 8887 resetting multiindex may be buggy dcherian 2448579 open 0     1 2024-03-28T16:23:38Z 2024-03-29T07:59:22Z   MEMBER      

What happened?

Resetting a MultiIndex dim coordinate preserves the MultiIndex levels as IndexVariables. We should either reset the indexes for the multiindex level variables, or warn asking the users to do so

This seems to be the root cause exposed by https://github.com/pydata/xarray/pull/8809

cc @benbovy

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr

ND DataArray that gets stacked along a multiindex

da = xr.DataArray(np.ones((3, 3)), coords={"dim1": [1, 2, 3], "dim2": [4, 5, 6]}) da = da.stack(feature=["dim1", "dim2"])

Extract just the stacked coordinates for saving in a dataset

ds = xr.Dataset(data_vars={"feature": da.feature}) xr.testing.assertions._assert_internal_invariants(ds.reset_index(["feature", "dim1", "dim2"]), check_default_indexes=False) # succeeds xr.testing.assertions._assert_internal_invariants(ds.reset_index(["feature"]), check_default_indexes=False) # fails, but no warning either ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8887/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1471685307 I_kwDOAMm_X85XuCK7 7344 Disable bottleneck by default? dcherian 2448579 open 0     11 2022-12-01T17:26:11Z 2024-03-27T00:22:41Z   MEMBER      

What is your issue?

Our choice to enable bottleneck by default results in quite a few issues about numerical stability and funny dtype behaviour: #7336, #7128, #2370, #1346 (and probably more)

Shall we disable it by default?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7344/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2187659148 I_kwDOAMm_X86CZQeM 8838 remove xfail from `test_dataarray.test_to_dask_dataframe()` dcherian 2448579 open 0     2 2024-03-15T03:43:02Z 2024-03-15T15:33:31Z   MEMBER      

What is your issue?

when dask-expr is fixed. Added in https://github.com/pydata/xarray/pull/8837

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8838/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2021856935 PR_kwDOAMm_X85g81gb 8509 Proof of concept - public Grouper objects dcherian 2448579 open 0     0 2023-12-02T04:52:27Z 2024-03-15T05:18:18Z   MEMBER   1 pydata/xarray/pulls/8509

Not for merging, just proof that it can be done nicely :)

Now builds on #8840 ~Builds on an older version of #8507~

Try it out!

```python import xarray as xr from xarray.core.groupers import SeasonGrouper, SeasonResampler

ds = xr.tutorial.open_dataset("air_temperature")

custom seasons!

ds.air.groupby(time=SeasonGrouper(["JF", "MAM", "JJAS", "OND"])).mean()

ds.air.resample(time=SeasonResampler(["DJF", "MAM", "JJAS", "ON"])).count() ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8509/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2149485914 I_kwDOAMm_X86AHo1a 8778 Stricter defaults for concat, combine, open_mfdataset dcherian 2448579 open 0     2 2024-02-22T16:43:38Z 2024-02-23T04:17:40Z   MEMBER      

Is your feature request related to a problem?

The defaults for concat are excessively permissive: data_vars="all", coords="different", compat="no_conflicts", join="outer". This comment illustrates why this can be hard to predict or understand: a seemingly unrelated option decode_cf controls whether a variable is in data_vars or coords, and can result in wildly different concatenation behaviour.

  1. This always concatenates data_vars along concat_dim even if they did not have that dimension to begin with.
  2. If the same coordinate var exists in different datasets/files, they will be sequentially compared for equality to decide whether they get concatenated.
  3. The outer join (applied along all dimensions that are not concat_dim) can result in very large datasets due to small floating points differences in the indexes, and also questionable behaviour with staggered grid datasets.
  4. "no_conflicts" basically picks the first not-NaN value after aligning all datasets, but is quite slow (we should be using duck_array_ops.nanfirst here I think).

While "convenient" this really just makes the default experience quite bad with hard-to-understand slowdowns.

Describe the solution you'd like

I propose we migrate to data_vars="minimal", coords="minimal", join="exact", compat="override". This should 1. only concatenate data_vars and coords variables when they already have concat_dim. 2. For any variables that do not have concat_dim, it will blindly pick them from the first file. 3. join="exact" will prevent ballooning of dimension sizes due to floating point inequalities. 4. These options will totally avoid any data reads unless explicitly requested by the user.

Unfortunately, this has a pretty big blast radius so we'd need a long deprecation cycle.

Describe alternatives you've considered

No response

Additional context

xref https://github.com/pydata/xarray/issues/4824 xref https://github.com/pydata/xarray/issues/1385 xref https://github.com/pydata/xarray/issues/8231 xref https://github.com/pydata/xarray/issues/5381 xref https://github.com/pydata/xarray/issues/2064 xref https://github.com/pydata/xarray/issues/2217

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8778/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
638947370 MDU6SXNzdWU2Mzg5NDczNzA= 4156 writing sparse to netCDF dcherian 2448579 open 0     7 2020-06-15T15:33:23Z 2024-01-09T10:14:00Z   MEMBER      

I haven't looked at this too closely but it appears that this is a way to save MultiIndexed datasets to netCDF. So we may be able to do sparse -> multiindex -> netCDF

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#compression-by-gathering

cc @fujiisoup

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4156/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2064480451 I_kwDOAMm_X857DXjD 8582 Adopt SPEC 0 instead of NEP-29 dcherian 2448579 open 0     1 2024-01-03T18:36:24Z 2024-01-03T20:12:05Z   MEMBER      

What is your issue?

https://docs.xarray.dev/en/stable/getting-started-guide/installing.html#minimum-dependency-versions says that we follow NEP-29, and I think our min versions script also does that.

I propose we follow https://scientific-python.org/specs/spec-0000/

In practice, I think this means we mostly drop Python versions earlier.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8582/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2052952379 I_kwDOAMm_X856XZE7 8568 Raise when assigning attrs to virtual variables (default coordinate arrays) dcherian 2448579 open 0     0 2023-12-21T19:24:11Z 2023-12-21T19:24:19Z   MEMBER      

Discussed in https://github.com/pydata/xarray/discussions/8567

<sup>Originally posted by **matthew-brett** December 21, 2023</sup> Sorry for the introductory question, but we (@ivanov and I) ran into this behavior while experimenting: ```python import numpy as np data = np.zeros((3, 4, 5)) ds = xr.DataArray(data, dims=('i', 'j', 'k')) print(ds['k'].attrs) ``` This shows `{}` as we might reasonably expect. But then: ```python ds['k'].attrs['foo'] = 'bar' print(ds['k'].attrs) ``` This also gives `{}`, which we found surprising. We worked out why that was, after a little experimentation (the default coordinate arrays seems to get created on the fly and garbage collected immediately). But it took us a little while. Is that as intended? Is there a way of making this less confusing? Thanks for any help.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8568/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1954809370 I_kwDOAMm_X850hAYa 8353 Update benchmark suite for asv 0.6.1 dcherian 2448579 open 0     0 2023-10-20T18:13:22Z 2023-12-19T05:53:21Z   MEMBER      

The new asv version comes with decorators for parameterizing and skipping, and the ability to use mamba to create environments.

https://github.com/airspeed-velocity/asv/releases

https://asv.readthedocs.io/en/v0.6.1/writing_benchmarks.html#skipping-benchmarks

This might help us reduce benchmark times a bit, or at least simplify the code some.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8353/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2027147099 I_kwDOAMm_X854089b 8523 tree-reduce the combine for `open_mfdataset(..., parallel=True, combine="nested")` dcherian 2448579 open 0     4 2023-12-05T21:24:51Z 2023-12-18T19:32:39Z   MEMBER      

Is your feature request related to a problem?

When parallel=True and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the "head node" for the combine.

Instead we can tree-reduce the combine (example) by switching to dask.bag instead of dask.delayed and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.

  1. The downside is the dask graph is "worse" but perhaps that shouldn't stop us.
  2. I think this is only feasible for combine="nested"

cc @TomNicholas

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8523/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1975400777 PR_kwDOAMm_X85efqSl 8408 Generalize explicit_indexing_adapter dcherian 2448579 open 0     0 2023-11-03T03:29:40Z 2023-11-03T03:53:25Z   MEMBER   1 pydata/xarray/pulls/8408

Use as_indexable instead of NumpyIndexingAdapter

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8408/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1950211465 I_kwDOAMm_X850Pd2J 8333 Should NamedArray be interchangeable with other array types? or Should we support the `axis` kwarg? dcherian 2448579 open 0     17 2023-10-18T16:46:37Z 2023-10-31T22:26:33Z   MEMBER      

What is your issue?

Raising @Illviljan's comment from https://github.com/pydata/xarray/pull/8304#discussion_r1363196597.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8333/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1952621896 I_kwDOAMm_X850YqVI 8337 Support rolling with numbagg dcherian 2448579 open 0     3 2023-10-19T16:11:40Z 2023-10-23T15:46:36Z   MEMBER      

Is your feature request related to a problem?

We can do plain reductions, and groupby reductions with numbagg. Rolling is the last one left!

I don't think coarsen will benefit since it's basically a reshape and reduce on that view, so it should already be accelerated. There may be small gains in handling the boundary conditions but that's probably it.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8337/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1954445639 I_kwDOAMm_X850fnlH 8350 optimize align for scalars at least dcherian 2448579 open 0     5 2023-10-20T14:48:25Z 2023-10-20T19:17:39Z   MEMBER      

What happened?

Here's a simple rescaling calculation: ```python import numpy as np import xarray as xr

ds = xr.Dataset( {"a": (("x", "y"), np.ones((300, 400))), "b": (("x", "y"), np.ones((300, 400)))} ) mean = ds.mean() # scalar std = ds.std() # scalar rescaled = (ds - mean) / std ```

The profile for the last line shows 30% (!!!) time spent in align (really reindex_like) except there's nothing to reindex when only scalars are involved!

This is a small example inspired by a ML pipeline where this normalization is happening very many times in a tight loop.

cc @benbovy

What did you expect to happen?

A fast path for when no reindexing needs to happen.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8350/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1943543755 I_kwDOAMm_X85z2B_L 8310 pydata/xarray as monorepo for Xarray and NamedArray dcherian 2448579 open 0     1 2023-10-14T20:34:51Z 2023-10-14T21:29:11Z   MEMBER      

What is your issue?

As we work through refactoring for NamedArray, it's pretty clear that Xarray will depend pretty closely on many files in namedarray/. For example various utils.py, pycompat.py, *ops.py, formatting.py, formatting_html.py at least. This promises to be quite painful if we did break NamedArray out in to its own repo (particularly around typing, e.g. https://github.com/pydata/xarray/pull/8309)

I propose we use pydata/xarray as a monorepo that serves two packages: NamedArray and Xarray. - We can move as much as is needed to have NamedArray be independent of Xarray, but Xarray will depend quite closely on many utility functions in NamedArray. - We can release both at the same time similar to dask and distributed. - We can re-evaluate if and when NamedArray grows its own community.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8310/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1942893480 I_kwDOAMm_X85zzjOo 8306 keep_attrs for NamedArray dcherian 2448579 open 0     0 2023-10-14T02:29:54Z 2023-10-14T02:31:35Z   MEMBER      

What is your issue?

Copying over @max-sixty's comment from https://github.com/pydata/xarray/pull/8304#discussion_r1358873522

I haven't been in touch with the NameArray discussions so forgive a glib comment — but re https://github.com/pydata/xarray/issues/3891 — this would be a "once-in-a-library" opportunity to always retain attrs in aggregations, removing the keep_attrs option in methods.

(Xarray could still handle them as it wished, so xarray's external interface wouldn't need to change immediately...)

@pydata/xarray Should we just delete the keep_attrs kwarg completely for NamedArray and always propagate attrs? obj.attrs.clear() seems just as easy to type.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8306/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1916012703 I_kwDOAMm_X85yNAif 8239 Address repo-review suggestions dcherian 2448579 open 0     7 2023-09-27T17:18:40Z 2023-10-02T20:24:34Z   MEMBER      

What is your issue?

Here's the output from the Scientific Python Repo Review tool.

There's an online version here.

On mac I run pipx run 'sp-repo-review[cli]' --format html --show err gh:pydata/xarray@main | pbcopy

A lot of these seem fairly easy to fix. I'll note that there's a large number of mypy config suggestions.

General

  • Detected build backend: setuptools.build_meta
  • Detected license(s): Apache Software License
<table> <tr><th>?</th><th>Name</th><th>Description</th></tr> <tr style="color: red;"> <td>❌</td> <td>PY007</td> <td> Supports an easy task runner (nox or tox)

Projects must have a noxfile.py or tox.ini to encourage new contributors.

</td> </tr> </table>

PyProject

See https://github.com/pydata/xarray/issues/8239#issuecomment-1739363809

<table> <tr><th>?</th><th>Name</th><th>Description</th></tr> <tr style="color: red;"> <td>❌</td> <td>PP305</td> <td> Specifies xfail_strict

xfail_strict should be set. You can manually specify if a check should be strict when setting each xfail.

[tool.pytest.ini_options]
xfail_strict = true
</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PP308</td> <td> Specifies useful pytest summary

-ra should be in addopts = [...] (print summary of all fails/errors).

[tool.pytest.ini_options]
addops = ["-ra", "--strict-config", "--strict-markers"]
</td> </tr> </table>

Pre-commit

<table> <tr><th>?</th><th>Name</th><th>Description</th></tr> <tr style="color: red;"> <td>❌</td> <td>PC110</td> <td> Uses black

Use https://github.com/psf/black-pre-commit-mirror instead of https://github.com/psf/black in .pre-commit-config.yaml

</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PC160</td> <td> Uses codespell

Must have https://github.com/codespell-project/codespell repo in .pre-commit-config.yaml

</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PC170</td> <td> Uses PyGrep hooks (only needed if RST present)

Must have https://github.com/pre-commit/pygrep-hooks repo in .pre-commit-config.yaml

</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PC180</td> <td> Uses prettier

Must have https://github.com/pre-commit/mirrors-prettier repo in .pre-commit-config.yaml

</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PC191</td> <td> Ruff show fixes if fixes enabled

If --fix is present, --show-fixes must be too.

</td> </tr> <tr style="color: red;"> <td>❌</td> <td>PC901</td> <td> Custom pre-commit CI message

Should have something like this in .pre-commit-config.yaml:

ci:
  autoupdate_commit_msg: 'chore: update pre-commit hooks'
</td> </tr> </table>

MyPy

<table> <tr><th>?</th><th>Name</th><th>Description</th></tr> <tr style="color: red;"> <td>❌</td> <td>MY101</td> <td> MyPy strict mode

Must have strict in the mypy config. MyPy is best with strict or nearly strict configuration. If you are happy with the strictness of your settings already, ignore this check or set strict = false explicitly.

[tool.mypy]
strict = true
</td> </tr> <tr style="color: red;"> <td>❌</td> <td>MY103</td> <td> MyPy warn unreachable

Must have warn_unreachable = true to pass this check. There are occasionally false positives (often due to platform or Python version static checks), so it's okay to ignore this check. But try it first - it can catch real bugs too.

[tool.mypy]
warn_unreachable = true
</td> </tr> <tr style="color: red;"> <td>❌</td> <td>MY104</td> <td> MyPy enables ignore-without-code

Must have "ignore-without-code" in enable_error_code = [...]. This will force all skips in your project to include the error code, which makes them more readable, and avoids skipping something unintended.

[tool.mypy]
enable_error_code = ["ignore-without-code", "redundant-expr", "truthy-bool"]
</td> </tr> <tr style="color: red;"> <td>❌</td> <td>MY105</td> <td> MyPy enables redundant-expr

Must have "redundant-expr" in enable_error_code = [...]. This helps catch useless lines of code, like checking the same condition twice.

[tool.mypy]
enable_error_code = ["ignore-without-code", "redundant-expr", "truthy-bool"]
</td> </tr> <tr style="color: red;"> <td>❌</td> <td>MY106</td> <td> MyPy enables truthy-bool

Must have "truthy-bool" in enable_error_code = []. This catches mistakes in using a value as truthy if it cannot be falsey.

[tool.mypy]
enable_error_code = ["ignore-without-code", "redundant-expr", "truthy-bool"]
</td> </tr> </table>

Ruff

<table> <tr><th>?</th><th>Name</th><th>Description</th></tr> <tr style="color: red;"> <td>❌</td> <td>RF101</td> <td> Bugbear must be selected

Must select the flake8-bugbear B checks. Recommended:

[tool.ruff]
select = [
  "B",  # flake8-bugbear
]
</td> </tr> </table>
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8239/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1217566173 I_kwDOAMm_X85IkpXd 6528 cumsum drops index coordinates dcherian 2448579 open 0     5 2022-04-27T16:04:08Z 2023-09-22T07:55:56Z   MEMBER      

What happened?

cumsum drops index coordinates. Seen in #6525, #3417

What did you expect to happen?

Preserve index coordinates

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.Dataset( {"foo": (("x",), [7, 3, 1, 1, 1, 1, 1])}, coords={"x": [0, 1, 2, 3, 4, 5, 6]}, ) ds.cumsum("x") ```

<xarray.Dataset> Dimensions: (x: 7) Dimensions without coordinates: x Data variables: foo (x) int64 7 10 11 12 13 14 15

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray main
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6528/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1859703572 I_kwDOAMm_X85u2NMU 8095 Support `inline_array` kwarg in `open_zarr` dcherian 2448579 open 0     2 2023-08-21T16:09:38Z 2023-09-21T20:37:50Z   MEMBER      

cc @TomNicholas

What happened?

There is no way to specify inline_array in open_zarr. Instead we have to use open_dataset.

Minimal Complete Verifiable Example

```Python import xarray as xr

xr.Dataset({"a": xr.DataArray([1.0])}).to_zarr("temp.zarr") ```

python xr.open_zarr('temp.zarr', inline_array=True) ValueError: argument inline_array cannot be passed both as a keyword argument and within the from_array_kwargs dictionary

python xr.open_zarr('temp.zarr', from_array_kwargs=dict(inline_array=True)) ValueError: argument inline_array cannot be passed both as a keyword argument and within the from_array_kwargs dictionary

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8095/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1902086612 PR_kwDOAMm_X85aoYuf 8206 flox: Set fill_value=np.nan always. dcherian 2448579 open 0     0 2023-09-19T02:19:49Z 2023-09-19T02:23:26Z   MEMBER   1 pydata/xarray/pulls/8206
  • [x] Closes #8090
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8206/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1812301185 I_kwDOAMm_X85sBYWB 8005 Design for IntervalIndex dcherian 2448579 open 0     5 2023-07-19T16:30:50Z 2023-09-09T06:30:20Z   MEMBER      

Is your feature request related to a problem?

We should add a wrapper for pandas.IntervalIndex this would solve a long standing problem around propagating "bounds" variables (CF conventions, https://github.com/pydata/xarray/issues/1475)

The CF design

CF "encoding" for intervals is to use bounds variables. There is an attribute "bounds" on the dimension coordinate, that refers to a second variable (at least 2D). Example: x has an attribute bounds that refers to x_bounds.

```python import numpy as np

left = np.arange(0.5, 3.6, 1) right = np.arange(1.5, 4.6, 1) bounds = np.stack([left, right])

ds = xr.Dataset( {"data": ("x", [1, 2, 3, 4])}, coords={"x": ("x", [1, 2, 3, 4], {"bounds": "x_bounds"}), "x_bounds": (("bnds", "x"), bounds)}, ) ds ```

A fundamental problem with our current data model is that we lose x_bounds when we extract ds.data because there is a dimension bnds that is not shared with ds.data. Very important metadata is now lost!

We would also like to use the "bounds" to enable interval based indexing. ds.sel(x=1.1) should give you the value from the appropriate interval.

Pandas IntervalIndex

All the indexing is easy to implement by wrapping pandas.IntervalIndex, but there is one limitation. pd.IntervalIndex saves two pieces of information for each interval (left bound, right bound). CF saves three : left bound, right bound (see x_bounds) and a "central" value (see x). This should be OK to work around in our wrapper.

Fundamental Question

To me, a core question is whether x_bounds needs to be preserved after creating an IntervalIndex. 1. If so, we need a better rule around coordinate variable propagation. In this case, the IntervalIndex would be associated with x and x_bounds. So the rule could be > "propagate all variables necessary to propagate an index associated with any of the dimensions on the extracted variable."

So when extracting `ds.data` we propagate all variables necessary to propagate indexes associated with `ds.data.dims` that is `x` which would say "propagate `x`, `x_bounds`, and the IntervalIndex.
  1. Alternatively, we could choose to drop x_bounds entirely. I interpret this approach as "decoding" the bounds variable to an interval index object. When saving to disk, we would encode the interval index in two variables. (See below)

Describe the solution you'd like

I've prototyped (2) [approach 1 in this notebook) following @benbovy's suggestion

```python from xarray import Variable from xarray.indexes import PandasIndex class XarrayIntervalIndex(PandasIndex): def __init__(self, index, dim, coord_dtype): assert isinstance(index, pd.IntervalIndex) # for PandasIndex self.index = index self.dim = dim self.coord_dtype = coord_dtype @classmethod def from_variables(cls, variables, options): assert len(variables) == 1 (dim,) = tuple(variables) bounds = options["bounds"] assert isinstance(bounds, (xr.DataArray, xr.Variable)) (axis,) = bounds.get_axis_num(set(bounds.dims) - {dim}) left, right = np.split(bounds.data, 2, axis=axis) index = pd.IntervalIndex.from_arrays(left.squeeze(), right.squeeze()) coord_dtype = bounds.dtype return cls(index, dim, coord_dtype) def create_variables(self, variables): from xarray.core.indexing import PandasIndexingAdapter newvars = {self.dim: xr.Variable(self.dim, PandasIndexingAdapter(self.index))} return newvars def __repr__(self): string = f"Xarray{self.index!r}" return string def to_pandas_index(self): return self.index @property def mid(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def left(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) @property def right(self): return PandasIndex(self.index.right, self.dim, self.coord_dtype) ```

python ds1 = ( ds.drop_indexes("x") .set_xindex("x", XarrayIntervalIndex, bounds=ds.x_bounds) .drop_vars("x_bounds") ) ds1

python ds1.sel(x=1.1)

Describe alternatives you've considered

I've tried some approaches in this notebook

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8005/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1888576440 I_kwDOAMm_X85wkWO4 8162 Update group by multi index dcherian 2448579 open 0     0 2023-09-09T04:50:29Z 2023-09-09T04:50:39Z   MEMBER      

ideally GroupBy._infer_concat_args() would return a xr.Coordinates object that contains both the coordinate(s) and their (multi-)index to assign to the result (combined) object.

The goal is to avoid calling create_default_index_implicit(coord) below where coord is a pd.MultiIndex or a single IndexVariable wrapping a multi-index. If coord is a Coordinates object, we could do combined = combined.assign_coords(coord) instead.

https://github.com/pydata/xarray/blob/e2b6f3468ef829b8a83637965d34a164bf3bca78/xarray/core/groupby.py#L1573-L1587

There are actually more general issues:

  • The group parameter of Dataset.groupby being a single variable or variable name, it won't be possible to do groupby on a full pandas multi-index once we drop its dimension coordinate (#8143). How can we still support it? Maybe passing a dimension name to group and check that there's only one index for that dimension?
  • How can we support custom, multi-coordinate indexes with groupby? I don't have any practical example in mind, but in theory just passing a single coordinate name as group will invalidate the index. Should we drop the index in the result? Or, like suggested above pass a dimension name as group and check the index?

Originally posted by @benbovy in https://github.com/pydata/xarray/issues/8140#issuecomment-1709775666

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8162/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1824824446 I_kwDOAMm_X85sxJx- 8025 Support Groupby first, last with flox dcherian 2448579 open 0     0 2023-07-27T17:07:51Z 2023-07-27T19:08:06Z   MEMBER      

Is your feature request related to a problem?

flox recently added support for first, last, nanfirst, nanlast. So we should support that on the Xarray GroupBy object.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8025/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
923355397 MDExOlB1bGxSZXF1ZXN0NjcyMTI5NzY4 5480 Implement weighted groupby dcherian 2448579 open 0     1 2021-06-17T02:57:17Z 2023-07-27T18:09:55Z   MEMBER   1 pydata/xarray/pulls/5480
  • xref #3937
  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Initial proof-of-concept. Suggestions to improve this are very welcome.

Here's some convenient testing code ``` python
import xarray as xr

ds = xr.tutorial.open_dataset('rasm').load() month_length = ds.time.dt.days_in_month weights = month_length.groupby('time.season') / month_length.groupby('time.season').sum()

actual = ds.weighted(month_length).groupby("time.season").mean() expected = (ds * weights).groupby('time.season').sum(skipna=False) xr.testing.assert_allclose(actual, expected) ```

I've added info to the repr python ds.weighted(month_length).groupby("time.season") WeightedDatasetGroupBy, grouped over 'season' 4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'. weighted along dimensions: time by 'days_in_month'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5480/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1822982776 I_kwDOAMm_X85sqIJ4 8023 Possible autoray integration dcherian 2448579 open 0     1 2023-07-26T18:57:59Z 2023-07-26T19:26:05Z   MEMBER      

I'm opening this issue for discussion really.

I stumbled on autoray (Github) by @jcmgray which provides an abstract interface to a number of array types.

What struck me was the very general lazy compute system. This opens up the possibility of lazy-but-not-dask computation.

Related: https://github.com/pydata/xarray/issues/2298 https://github.com/pydata/xarray/issues/1725 https://github.com/pydata/xarray/issues/5081

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8023/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 2
}
    xarray 13221727 issue
1658291950 I_kwDOAMm_X85i14bu 7737 align ignores `copy` dcherian 2448579 open 0     2 2023-04-07T02:54:00Z 2023-06-20T23:07:56Z   MEMBER      

Is your feature request related to a problem?

cc @benbovy

xref #7730

``` python import numpy as np import xarray as xr

arr = np.random.randn(10, 10, 36530) time = xr.date_range("2000", periods=30365, calendar="noleap") da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time}) year = da["time.year"] ```

python xr.align(da, year, join="outer", copy=False) This should result in no copies, but does

Describe the solution you'd like

I think we need to check aligner.copy and/or aligner.reindex (maybe?) before copying here

https://github.com/pydata/xarray/blob/f8127fc9ad24fe8b41cce9f891ab2c98eb2c679a/xarray/core/dataset.py#L2805-L2818

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7737/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1760733017 I_kwDOAMm_X85o8qdZ 7924 Migrate from nbsphinx to myst, myst-nb dcherian 2448579 open 0     4 2023-06-16T14:17:41Z 2023-06-20T22:07:42Z   MEMBER      

Is your feature request related to a problem?

I think we should switch to MyST markdown for our docs. I've been using MyST markdown and MyST-NB in docs in other projects and it works quite well.

Advantages: 1. We get HTML reprs in the docs (example) which is a big improvement. (#6620) 2. I think many find markdown a lot easier to write than RST

There's a tool to migrate RST to MyST (RTD's migration guide).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7924/reactions",
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
756425955 MDU6SXNzdWU3NTY0MjU5NTU= 4648 Comprehensive benchmarking suite dcherian 2448579 open 0     6 2020-12-03T18:01:57Z 2023-06-15T16:56:00Z   MEMBER      

I think a good "infrastructure" target for the NASA OSS call would be to expand our benchmarking suite (https://pandas.pydata.org/speed/xarray/#/)

AFAIK running these in a useful manner on CI is still unsolved (please correct me if I'm wrong). But we can always run it on an NCAR machine using a cron job.

Thoughts?

cc @scottyhq

A quick survey of work needed (please append): - [ ] indexing & slicing #3382 #2799 #2227 - [ ] DataArray construction #4744 - [ ] attribute access #4741, #4742 - [ ] property access #3514 - [ ] reindexing? https://github.com/pydata/xarray/issues/1385#issuecomment-297539517 - [x] alignment #3755, #7738 - [ ] assignment #1771 - [ ] coarsen - [x] groupby #659 #7795 #7796 - [x] resample #4498 #7795 - [ ] weighted #4482 #3883 - [ ] concat #7824 - [ ] merge - [ ] open_dataset, open_mfdataset #1823 - [ ] stack / unstack - [ ] apply_ufunc? - [x] interp #4740 #7843 - [ ] reprs #4744 - [x] to_(dask)_dataframe #7844 #7474

Related: #3514

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4648/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1700678362 PR_kwDOAMm_X85QBdXY 7828 GroupBy: Fix reducing by subset of grouper dims dcherian 2448579 open 0     0 2023-05-08T18:00:54Z 2023-05-10T02:41:39Z   MEMBER   1 pydata/xarray/pulls/7828
  • [x] Tests added

Fixes yet another bug with GroupBy reductions. We weren't assigning the group index when reducing by a subset of dimensions present on the grouper

This will only pass when flox 0.7.1 reaches conda-forge.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7828/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1236174701 I_kwDOAMm_X85Jrodt 6610 Update GroupBy constructor for grouping by multiple variables, dask arrays dcherian 2448579 open 0     6 2022-05-15T03:17:54Z 2023-04-26T16:06:17Z   MEMBER      

What is your issue?

flox supports grouping by multiple variables (would fix #324, #1056) and grouping by dask variables (would fix #2852).

To enable this in GroupBy we need to update the constructor's signature to 1. Accept multiple "by" variables. 2. Accept "expected group labels" for grouping by dask variables (like bins for groupby_bins which already supports grouping by dask variables). This lets us construct the output coordinate without evaluating the dask variable. 3. We may also want to simultaneously group by a categorical variable (season) and bin by a continuous variable (air temperature). So we also need a way to indicate whether the "expected group labels" are "bin edges" or categories.


The signature in flox is (may be errors!) python xarray_reduce( obj: Dataset | DataArray, *by: DataArray | str, func: str | Aggregation, expected_groups: Sequence | np.ndarray | None = None, isbin: bool | Sequence[bool] = False, ... )

You would calculate that last example using flox as python xarray_reduce( ds, "season", "air_temperature", expected_groups=[None, np.arange(21, 30, 1)], isbin=[False, True], ... )

The use of expected_groups and isbin seems ugly to me (the names could also be better!)


I propose we update groupby's signature to 1. change group: DataArray | str to group: DataArray | str | Iterable[str] | Iterable[DataArray] 2. We could add a top-level xr.Bins object that wraps bin edges + any kwargs to be passed to pandas.cut. Note our current groupby_bins signature has a bunch of kwargs passed directly to pandas.cut. 3. Finally add groups: None | ArrayLike | xarray.Bins | Iterable[None | ArrayLike | xarray.Bins] to pass the "expected group labels". 1. If None, then groups will be auto-detected from non-dask group arrays (if None for a dask group, then raise error). 1. If xarray.Bins indicates binning by the appropriate variables 1. If ArrayLike treat as categorical. 1. groups is a little too similar to group so we should choose a better name. 1. The ordering of ArrayLike would let us fix #757 (pass the seasons in the order you want them in the output)

So then that example becomes python ds.groupby( ["season", "air_temperature"], # season is numpy, air_temperature is dask groups=[None, xr.Bins(np.arange(21, 30, 1), closed="right")], )

Thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6610/reactions",
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1649611456 I_kwDOAMm_X85iUxLA 7704 follow upstream scipy interpolation improvements dcherian 2448579 open 0     0 2023-03-31T15:46:56Z 2023-03-31T15:46:56Z   MEMBER      

Is your feature request related to a problem?

Scipy 1.10.0 has some great improvements to interpolation (release notes) particularly around the fancier methods like pchip.

It'd be good to see if we can simplify some of our code (or even enable using these options).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7704/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
344614881 MDU6SXNzdWUzNDQ2MTQ4ODE= 2313 Example on using `preprocess` with `mfdataset` dcherian 2448579 open 0     6 2018-07-25T21:31:34Z 2023-03-14T12:35:00Z   MEMBER      

I wrote this little notebook today while trying to get some satellite data in form that was nice to work with: https://gist.github.com/dcherian/66269bc2b36c2bc427897590d08472d7

I think it would make a useful example for the docs.

A few questions: 1. Do you think it'd be a good addition to the examples? 2. Is this the recommended way of adding meaningful co-ordinates, expanding dims etc.? The main bit is this function: ``` def preprocess(ds):

    dsnew = ds.copy()
    dsnew['latitude'] = xr.DataArray(np.linspace(90, -90, 180),
                                     dims=['phony_dim_0'])
    dsnew['longitude'] = xr.DataArray(np.linspace(-180, 180, 360),
                                      dims=['phony_dim_1'])
    dsnew = (dsnew.rename({'l3m_data': 'sss',
                           'phony_dim_0': 'latitude',
                           'phony_dim_1': 'longitude'})
             .set_coords(['latitude', 'longitude'])
             .drop('palette'))

    dsnew['time'] = (pd.to_datetime(dsnew.attrs['time_coverage_start'])
                     + np.timedelta64(3, 'D') + np.timedelta64(12, 'h'))
    dsnew = dsnew.expand_dims('time').set_coords('time')

    return dsnew

```

Also open to other feedback...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2313/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1599044689 I_kwDOAMm_X85fT3xR 7558 shift time using frequency strings dcherian 2448579 open 0     2 2023-02-24T17:35:52Z 2023-02-26T15:08:13Z   MEMBER      

Discussed in https://github.com/pydata/xarray/discussions/7557

<sup>Originally posted by **arfriedman** February 24, 2023</sup> Hi, In addition to integer offsets, I was wondering if it is possible to [shift](https://docs.xarray.dev/en/stable/generated/xarray.Variable.shift.html) a variable by a specific time frequency interval as in [pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.shift.html). For example, something like: ``` import xarray as xr ds = xr.tutorial.load_dataset("air_temperature") air = ds["air"] air.shift(time="1D") ``` Otherwise, is there another xarray function or recommended approach for this type of operation?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7558/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1599056009 I_kwDOAMm_X85fT6iJ 7559 Support specifying chunk sizes using labels (e.g. frequency string) dcherian 2448579 open 0     2 2023-02-24T17:44:03Z 2023-02-25T03:46:49Z   MEMBER      

Is your feature request related to a problem?

dask.dataframe supports repartitioning or rechunking using a frequency string (freq kwarg).

I think this would be a useful addition to .chunk. It would help with some groupby problems (as suggested in this comment) and generally make a few problems amenable to blockwise/map_blocks solutions.

Describe the solution you'd like

  1. One solution is to allow .chunk(lon=5, time="MS"). There is some ugliness in that this syntax mixes up integer index values (lon=5) and a label-based frequency string time="MS"
  2. So perhaps a second method chunk_by_labels would be useful where chunk_by_labels(lon=5, time="MS") would rechunk the data so that a single chunk contains 5° of longitude points and a month of time. Alternative this could be .chunk(lon=5, time="MS", by="labels")

Describe alternatives you've considered

Have the user do this manually but that's kind of annoying, and a bit advanced.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7559/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1119647191 I_kwDOAMm_X85CvHXX 6220 [FEATURE]: Use fast path when grouping by unique monotonic decreasing variable dcherian 2448579 open 0     1 2022-01-31T16:24:29Z 2023-01-09T16:48:58Z   MEMBER      

Is your feature request related to a problem?

See https://github.com/pydata/xarray/pull/6213/files#r795716713

We check whether the by variable for groupby is unique and monotonically increasing. But the fast path would also apply to unique and monotonically decreasing variables.

Describe the solution you'd like

Update the condition to is_monotonic_increasing or is_monotonic_decreasing and add a test.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6220/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1194945072 I_kwDOAMm_X85HOWow 6447 allow merging datasets where a variable might be a coordinate variable only in a subset of datasets dcherian 2448579 open 0     1 2022-04-06T17:53:51Z 2022-11-16T03:46:56Z   MEMBER      

Is your feature request related to a problem?

Here are two datasets, in one a is a data_var, in the other a is a coordinate variable. The following fails ``` python import xarray as xr

ds1 = xr.Dataset({"a": ('x', [1, 2, 3])}) ds2 = ds1.set_coords("a") ds2.update(ds1) with 649 ambiguous_coords = coord_names.intersection(noncoord_names) 650 if ambiguous_coords: --> 651 raise MergeError( 652 "unable to determine if these variables should be " 653 f"coordinates or not in the merged result: {ambiguous_coords}" 654 ) 656 attrs = merge_attrs( 657 [var.attrs for var in coerced if isinstance(var, (Dataset, DataArray))], 658 combine_attrs, 659 ) 661 return _MergeResult(variables, coord_names, dims, out_indexes, attrs)

MergeError: unable to determine if these variables should be coordinates or not in the merged result: {'a'} ```

Describe the solution you'd like

I think we should replace this error with a warning and arbitrarily choose to either convert a to a coordinate variable or a data variable.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6447/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
802525282 MDExOlB1bGxSZXF1ZXN0NTY4NjUzOTg0 4868 facets and hue with hist dcherian 2448579 open 0     0 2021-02-05T22:49:36Z 2022-10-19T07:27:32Z   MEMBER   0 pydata/xarray/pulls/4868
  • [x] Closes #4288
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4868/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
802431534 MDExOlB1bGxSZXF1ZXN0NTY4NTc1NzIw 4866 Refactor line plotting dcherian 2448579 open 0     0 2021-02-05T19:51:24Z 2022-10-18T20:13:14Z   MEMBER   0 pydata/xarray/pulls/4866

Refactors line plotting to use a _plot1d decorator.

Next i'll use this decorator on hist so we can "facet" and "hue" histograms.

see #4288

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4866/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1378174355 I_kwDOAMm_X85SJUWT 7055 Use roundtrip context manager in distributed write tests dcherian 2448579 open 0     0 2022-09-19T15:53:40Z 2022-09-19T15:53:40Z   MEMBER      

What is your issue?

File roundtripping tests in test_distributed.py don't use the roundtrip context manager (thpugh one uses create_tmp_file) so I don't think any created files are being cleaned up.

Example: https://github.com/pydata/xarray/blob/09e467a6a3a8ed68c6c29647ebf2b09288145da1/xarray/tests/test_distributed.py#L91-L119

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7055/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1321228754 I_kwDOAMm_X85OwFnS 6845 Do we need to update AbstractArray for duck arrays? dcherian 2448579 open 0     6 2022-07-28T16:59:59Z 2022-07-29T17:20:39Z   MEMBER      

What happened?

I'm calling cupy.round on a DataArray wrapping a cupy array and it raises an error here: https://github.com/pydata/xarray/blob/3f7cc2da33d81e76afbfb82da57143b624b03a88/xarray/core/common.py#L155-L156

Traceback below:

``` --> 25 a = _core.array(a, copy=False) 26 return a.round(decimals, out=out) 27 cupy/_core/core.pyx in cupy._core.core.array() cupy/_core/core.pyx in cupy._core.core.array() cupy/_core/core.pyx in cupy._core.core._array_default() ~/miniconda3/envs/gpu/lib/python3.7/site-packages/xarray/core/common.py in __array__(self, dtype) 146 147 def __array__(self: Any, dtype: DTypeLike = None) -> np.ndarray: --> 148 return np.asarray(self.values, dtype=dtype) 149 150 def __repr__(self) -> str: ~/miniconda3/envs/gpu/lib/python3.7/site-packages/xarray/core/dataarray.py in values(self) 644 type does not support coercion like this (e.g. cupy). 645 """ --> 646 return self.variable.values 647 648 @values.setter ~/miniconda3/envs/gpu/lib/python3.7/site-packages/xarray/core/variable.py in values(self) 517 def values(self): 518 """The variable's data as a numpy.ndarray""" --> 519 return _as_array_or_item(self._data) 520 521 @values.setter ~/miniconda3/envs/gpu/lib/python3.7/site-packages/xarray/core/variable.py in _as_array_or_item(data) 257 TODO: remove this (replace with np.asarray) once these issues are fixed 258 """ --> 259 data = np.asarray(data) 260 if data.ndim == 0: 261 if data.dtype.kind == "M": cupy/_core/core.pyx in cupy._core.core.ndarray.__array__() TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly. ```

What did you expect to happen?

Not an error? I'm not sure what's expected

np.round(dataarray) does actually work successfully.

My question is : Do we need to update AbstractArray.__array__ to return the underlying duck array instead of always a numpy array?

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray v2022.6.0 cupy 10.6.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6845/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
540451721 MDExOlB1bGxSZXF1ZXN0MzU1MjU4NjMy 3646 [WIP] GroupBy plotting dcherian 2448579 open 0     7 2019-12-19T17:26:39Z 2022-06-09T14:50:17Z   MEMBER   1 pydata/xarray/pulls/3646
  • [x] Tests added
  • [x] Passes black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

This adds plotting methods to GroupBy objects so that it's easy to plot each group as a facet. I'm finding this super helpful in my current research project.

It's pretty self-contained, mostly just adding map_groupby* methods to FacetGrid. But that's because I make GroupBy mimic the underlying DataArray by adding coords, attrs and __getitem__.

This still needs more tests but I would like feedback on the feature and the implementation.

Example

``` python import numpy as np import xarray as xr

time = np.arange(80) da = xr.DataArray(5 * np.sin(2np.pitime/10), coords={"time": time}, dims="time") da["period"] = da.time.where((time % 10) == 0).ffill("time")/10 da.plot() ```

python da.groupby("period").plot(col="period", col_wrap=4)

python da = da.expand_dims(y=10) da.groupby("period").plot(col="period", col_wrap=4, sharex=False, sharey=True, robust=True)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3646/reactions",
    "total_count": 3,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
663931851 MDU6SXNzdWU2NjM5MzE4NTE= 4251 expanded attrs makes HTML repr confusing to read dcherian 2448579 open 0     2 2020-07-22T17:33:13Z 2022-04-18T03:23:16Z   MEMBER      

When the attrs are expanded, it can be hard to distinguish between the attrs and the next variable.

See

>>> xr.tutorial.open_dataset("air_temperature")

Perhaps the gray background could be applied to attrs associated with a variable too?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4251/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1203414243 I_kwDOAMm_X85HuqTj 6481 refactor broadcast for flexible indexes dcherian 2448579 open 0     0 2022-04-13T14:51:19Z 2022-04-13T14:51:28Z   MEMBER      

What is your issue?

From @benbovy in https://github.com/pydata/xarray/pull/6477

  • extract common indexes and explicitly pass them to the Dataset and DataArray constructors (when implemented) that are called in the broadcast helper functions (there are some temporary and ugly hacks in create_default_index_implicit so that it works now with pandas multi-indexes wrapped in coordinate variables without the need to pass those indexes explicitly)
  • extract common indexes based on the dimension(s) of their coordinates and not their name (e.g., case of non-dimension but indexed coordinate)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6481/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1194790343 I_kwDOAMm_X85HNw3H 6445 map removes non-dimensional coordinate variables dcherian 2448579 open 0     0 2022-04-06T15:40:40Z 2022-04-06T15:40:40Z   MEMBER      

What happened?

python ds = xr.Dataset( {"a": ("x", [1, 2, 3])}, coords={"c": ("x", [1, 2, 3]), "d": ("y", [1, 2, 3, 4])} ) print(ds.coords) mapped = ds.map(lambda x: x) print(mapped.coords)

Variables d gets dropped in the map call. It does not share any dimensions with any of the data variables. Coordinates: c (x) int64 1 2 3 d (y) int64 1 2 3 4 Coordinates: c (x) int64 1 2 3

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray 2022.03.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6445/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1171916710 I_kwDOAMm_X85F2gem 6372 apply_ufunc + dask="parallelized" + no core dimensions should raise a nicer error about core dimensions being absent dcherian 2448579 open 0     0 2022-03-17T04:25:37Z 2022-03-17T05:10:16Z   MEMBER      

What happened?

From https://github.com/pydata/xarray/discussions/6370

Calling apply_ufunc(..., dask="parallelized") with no core dimensions and dask input "works" but raises an error on compute (ValueError: axes don't match array from np.transpose).

python xr.apply_ufunc( lambda x: np.mean(x), dt, dask="parallelized" )

What did you expect to happen?

With numpy data the apply_ufunc call does raise an error:

xr.apply_ufunc( lambda x: np.mean(x), dt.compute(), dask="parallelized" )

ValueError: applied function returned data with unexpected number of dimensions. Received 0 dimension(s) but expected 1 dimensions with names: ('x',)

Minimal Complete Verifiable Example

``` python import xarray as xr

dt = xr.Dataset( data_vars=dict( value=(["x"], [1,1,2,2,2,3,3,3,3,3]), ), coords=dict( lon=(["x"], np.linspace(0,1,10)), ), ).chunk(chunks={'x': tuple([2,3,5])}) # three chunks of different size

xr.apply_ufunc( lambda x: np.mean(x), dt, dask="parallelized" ) ```

Relevant log output

No response

Anything else we need to know?

No response

Environment

N/A

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6372/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
584461380 MDU6SXNzdWU1ODQ0NjEzODA= 3868 What should pad do about IndexVariables? dcherian 2448579 open 0     6 2020-03-19T14:40:21Z 2022-02-22T16:02:21Z   MEMBER      

Currently pad adds NaNs for coordinate labels, which results in substantially reduced functionality.

We need to think about 1. Int, Float, Datetime64, CFTime indexes: linearly extrapolate? Should we care whether the index is sorted or not? (I think not) 2. MultiIndexes: ?? 3. CategoricalIndexes: ?? 4. Unindexed dimensions

EDIT: Added unindexed dimensions

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3868/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
937266282 MDU6SXNzdWU5MzcyNjYyODI= 5578 Specify minimum versions in setup.cfg dcherian 2448579 open 0     2 2021-07-05T17:25:03Z 2022-01-09T03:33:38Z   MEMBER      

See https://github.com/pydata/xarray/issues/5342#issuecomment-873660034

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5578/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
514716299 MDU6SXNzdWU1MTQ3MTYyOTk= 3468 failure when roundtripping empty dataset to pandas dcherian 2448579 open 0     1 2019-10-30T14:28:31Z 2021-11-13T14:54:09Z   MEMBER      

see https://github.com/pydata/xarray/pull/3285

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3468/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1048856436 I_kwDOAMm_X84-hEd0 5962 Test resampling with dask arrays dcherian 2448579 open 0     0 2021-11-09T17:02:45Z 2021-11-09T17:02:45Z   MEMBER      

I noticed that we don't test resampling with dask arrays (well just one).

This could be a good opportunity to convert test_groupby.py to use test fixtures like in https://github.com/pydata/xarray/pull/5411

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5962/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1043846371 I_kwDOAMm_X84-N9Tj 5934 add test for custom backend entrypoint dcherian 2448579 open 0     0 2021-11-03T16:57:14Z 2021-11-03T16:57:21Z   MEMBER      

From https://github.com/pydata/xarray/pull/5931

It would be good to add a test checking that custom backend entrypoints work. This might involve creating a dummy package that registers an entrypoint (https://github.com/pydata/xarray/pull/5931#issuecomment-959131968)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5934/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
965072308 MDU6SXNzdWU5NjUwNzIzMDg= 5687 Make cftime dateoffsets public dcherian 2448579 open 0     2 2021-08-10T14:57:39Z 2021-08-10T23:28:20Z   MEMBER      

Consider the following cftime vector. It's fairly common to see users asking how to subtract "1 month" from this kind of vector:

python xr.set_options(display_style="text") time = xr.DataArray( xr.cftime_range("1000-01-01", "1000-05-01", freq="MS", calendar="360_day"), dims="time", name="time" ) time <xarray.DataArray 'time' (time: 5)> array([cftime.Datetime360Day(1000, 1, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 2, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 3, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 4, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 5, 1, 0, 0, 0, 0, has_year_zero=False)], dtype=object) Coordinates: * time (time) object 1000-01-01 00:00:00 ... 1000-05-01 00:00:00

Subtracting pd.Timedelta("1 month") does not work because a month does not represent an absolute unit of time. Instead the solution appears to be: python time - xr.coding.cftime_offsets.MonthBegin(1) <xarray.DataArray 'time' (time: 5)> array([cftime.Datetime360Day(999, 12, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 1, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 2, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 3, 1, 0, 0, 0, 0, has_year_zero=False), cftime.Datetime360Day(1000, 4, 1, 0, 0, 0, 0, has_year_zero=False)], dtype=object) Coordinates: * time (time) object 1000-01-01 00:00:00 ... 1000-05-01 00:00:00

I think pandas exposes this functionality as pd.DateOffset(months=1). Can we add a similar xr.DateOffset?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5687/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
938141608 MDU6SXNzdWU5MzgxNDE2MDg= 5582 Faster unstacking of dask arrays dcherian 2448579 open 0     0 2021-07-06T18:12:05Z 2021-07-06T18:54:40Z   MEMBER      

Recent dask version support assigning to a list of ints along one dimension. we can use this for unstacking (diff builds on #5577)

```diff diff --git i/xarray/core/variable.py w/xarray/core/variable.py index 222e8dab9..a50dfc574 100644 --- i/xarray/core/variable.py +++ w/xarray/core/variable.py @@ -1593,11 +1593,9 @@ class Variable(AbstractArray, NdimSizeLenMixin, VariableArithmetic): else: dtype = self.dtype

  • if sparse:
  • if sparse and not is_duck_dask_array(reordered): # unstacking a dense multitindexed array to a sparse array
  • Use the sparse.COO constructor until sparse supports advanced indexing

  • https://github.com/pydata/sparse/issues/114

  • TODO: how do we allow different sparse array types

  • Use the sparse.COO constructor since we cannot assign to sparse.COO

         from sparse import COO
    
         codes = zip(*index.codes)
    

    @@ -1618,19 +1616,23 @@ class Variable(AbstractArray, NdimSizeLenMixin, VariableArithmetic): )

     else:
    
    • dask supports assigning to a list of ints along one axis only.

    • So we construct an array with the last dimension flattened,

    • assign the values, then reshape to the final shape.

    • intermediate_shape = reordered.shape[:-1] + (np.prod(new_dim_sizes),)
    • indexer = np.ravel_multi_index(index.codes, new_dim_sizes) data = np.full_like( self.data, fill_value=fill_value,
    • shape=new_shape,
    • shape=intermediate_shape, dtype=dtype, )

       # Indexer is a list of lists of locations. Each list is the locations
       # on the new dimension. This is robust to the data being sparse; in that
       # case the destinations will be NaN / zero.
      
      • sparse doesn't support item assigment,

      • https://github.com/pydata/sparse/issues/114

      • data[(..., *indexer)] = reordered
      • data[(..., indexer)] = reordered
      • data = data.reshape(new_shape)

      return self._replace(dims=new_dims, data=data) ```

This should be what alignment.reindex_variables is doing but I don't fully understand that function.

The annoying bit is figuring out when to use this version and what to do with things like dask wrapping sparse. I think we want to loop over each variable in Dataset.unstack calling Variable.unstack and dispatch based on the type of Variable.data to easily handle all the edge cases.

cc @Illviljan if you're interested in implementing this

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5582/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
520079199 MDU6SXNzdWU1MjAwNzkxOTk= 3497 how should xarray handle pandas attrs dcherian 2448579 open 0     1 2019-11-08T15:32:36Z 2021-07-04T03:31:02Z   MEMBER      

Continuing discussion form #3491.

Pandas has added attrs to their objects. We should decide on what to do with them in the DataArray constructor. Many tests fail if we don't handle this case explicitly.

@dcherian:

Not sure what we want to do about these attributes in the long term. One option would be to pop the name attribute, assign to DataArray.name and keep the rest as DataArray.attrs? But what if name clashes with the provided name?

@max-sixty:

Agree! I think we could prioritize the supplied name above that in attrs. Another option would be raising an error if both were supplied.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3497/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
798586325 MDU6SXNzdWU3OTg1ODYzMjU= 4852 mention HDF files in docs dcherian 2448579 open 0     0 2021-02-01T18:05:23Z 2021-07-04T01:24:22Z   MEMBER      

This is such a common question that we should address it in the docs.

Just saying that some hdf5 files can be opened with h5netcdf, and that the user needs to manually create xarray objects with everything else should be enough.

https://xarray.pydata.org/en/stable/io.html

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4852/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
797053785 MDU6SXNzdWU3OTcwNTM3ODU= 4848 simplify API reference presentation dcherian 2448579 open 0     0 2021-01-29T17:23:41Z 2021-01-29T17:23:46Z   MEMBER      

Can we remove xarray.core.rolling and core.rolling on the left and right respectively? I think the API reference would be a lot more readable if we could do that

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4848/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
787486472 MDU6SXNzdWU3ODc0ODY0NzI= 4817 Add encoding to HTML repr dcherian 2448579 open 0     0 2021-01-16T15:14:50Z 2021-01-24T17:31:31Z   MEMBER      

Is your feature request related to a problem? Please describe. .encoding is somewhat hidden since we don't show it in a repr.

Describe the solution you'd like I think it'd be nice to add it to the HTML repr, collapsed by default.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4817/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
648250671 MDU6SXNzdWU2NDgyNTA2NzE= 4189 List supported options for `backend_kwargs` in `open_dataset` dcherian 2448579 open 0     0 2020-06-30T15:01:31Z 2020-12-15T04:28:04Z   MEMBER      

We should list supported options for backend_kwargs in the docstring for open_datasetand possibly in io.rst

xref #4187

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4189/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
685825824 MDU6SXNzdWU2ODU4MjU4MjQ= 4376 wrong chunk sizes in html repr with nonuniform chunks dcherian 2448579 open 0     3 2020-08-25T21:23:11Z 2020-10-07T11:11:23Z   MEMBER      

What happened:

The HTML repr is using the first element in a chunks tuple;

What you expected to happen:

it should be using whatever dask does in this case

Minimal Complete Verifiable Example:

```python

import xarray as xr import dask

test = xr.DataArray( dask.array.zeros( (12, 901, 1001), chunks=( (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (1, 899, 1), (1, 199, 1, 199, 1, 199, 1, 199, 1, 199, 1), ), ) ) test.to_dataset(name="a") ```

EDIT: The text repr has the same issue <xarray.Dataset> Dimensions: (dim_0: 12, dim_1: 901, dim_2: 1001) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: a (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(1, 1, 1), meta=np.ndarray>

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4376/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
538521723 MDU6SXNzdWU1Mzg1MjE3MjM= 3630 reviewnb for example notebooks? dcherian 2448579 open 0     0 2019-12-16T16:34:28Z 2019-12-16T16:34:28Z   MEMBER      

What do people think of adding ReviewNB https://www.reviewnb.com/ to facilitate easy reviewing of example notebooks?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3630/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
435787982 MDU6SXNzdWU0MzU3ODc5ODI= 2913 Document xarray data model dcherian 2448579 open 0     0 2019-04-22T16:23:41Z 2019-04-22T16:23:41Z   MEMBER      

It would be nice to have a separate page that detailed this for users unfamiliar with netCDF.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2913/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 35.938ms · About: xarray-datasette