home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

74 rows where state = "closed" and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, draft, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 50
  • issue 24

state 1

  • closed · 74 ✖

repo 1

  • xarray 74
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2101987013 PR_kwDOAMm_X85lJbZW 8672 Fix multiindex level serialization after reset_index benbovy 4160723 closed 0     6 2024-01-26T10:40:42Z 2024-02-23T01:22:17Z 2024-01-31T17:42:29Z MEMBER   0 pydata/xarray/pulls/8672
  • [x] Closes #8628
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8672/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
915057433 MDU6SXNzdWU5MTUwNTc0MzM= 5452 [community] Flexible indexes meeting benbovy 4160723 closed 0     7 2021-06-08T13:32:16Z 2024-02-15T01:39:08Z 2024-02-15T01:39:08Z MEMBER      

In addition to the bi-weekly community developers meeting, we plan to have 30min meetings on a weekly basis -- every Tue 8:30-9:00 PDT (17:30-18:00 CEST) -- to discuss the flexible indexes refactor.

Anyone from @pydata/xarray feel free to join! The first meeting is in a couple of hours.

Zoom link (subject to change).

Google calendar

Meeting notes

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5452/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
213004586 MDU6SXNzdWUyMTMwMDQ1ODY= 1303 `xarray.core.variable.as_variable()` part of the public API? benbovy 4160723 closed 0     5 2017-03-09T11:07:52Z 2024-02-06T17:57:21Z 2017-06-02T17:55:12Z MEMBER      

Is it safe to use xarray.core.variable.as_variable() externally? I guess that currently it is not.

I have a specific use case where this would be very useful.

I'm working on a package that heavily uses and extends xarray for landscape evolution modeling, and inside a custom class for model parameters I want to be able to create xarray.Variable objects on the fly from any provided object, e.g., a scalar value, an array-like, a (dims, data[, attrs]) tuple, another xarray.Variable, a xarray.DataArray... exactly what xarray.core.variable.as_variable() does.

Although I know that Variable objects are not needed in most use cases, in this specific case a clean solution would be the following

```python import xarray as xr

class Parameter(object):

def to_variable(self, obj):
    return xr.as_variable(obj)
    # ... some validation logic on, e.g., data type, value bounds, dimensions...
    # ... add default attributes to the created variable (e.g., units, description...)

```

I don't think it is a viable option to copy as_variable() and all its dependent code in my package as it seems to have quite a lot of logic implemented.

A workaround using only public API would be something like:

```python class Parameter(object):

def to_variable(self, obj):
    return xr.Dataset(data_vars={'v': obj}).variables['v']

```

but it feels a bit hacky.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1303/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1864056633 PR_kwDOAMm_X85YovK- 8107 Better default behavior of the Coordinates constructor benbovy 4160723 closed 0     2 2023-08-23T21:42:51Z 2024-02-04T18:32:42Z 2023-08-31T07:35:47Z MEMBER   0 pydata/xarray/pulls/8107
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

After working more on Coordinates I realize that the default behavior of its constructor could be more consistent with other Xarray objects. This PR changes this default behavior such that:

  • Pandas indexes are created for dimension coordinates if indexes=None (default). To create dimension coordinates with no index, just pass indexes={}.
  • If another Coordinates object is passed as input, its indexes are also added to the new created object. Since we don't support alignment / merge here, the following call raises an error: xr.Coordinates(coords=xr.Coordinates(...), indexes={...}).

This PR introduces a breaking change since Coordinates are now exposed in v2023.8.0, which has just been released. It is a bit unfortunate but I think it may be OK for a fresh feature, especially if the next release will be soon after this one.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8107/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1879864306 PR_kwDOAMm_X85ZdmTF 8142 Dirty workaround for mypy 1.5 error benbovy 4160723 closed 0     8 2023-09-04T09:21:18Z 2023-09-07T16:04:55Z 2023-09-07T08:21:12Z MEMBER   0 pydata/xarray/pulls/8142

I wanted to fix the following error with mypy 1.5:

xarray/core/dataset.py:505: error: Definition of "__eq__" in base class "DatasetOpsMixin" is incompatible with definition in base class "Mapping" [misc]

Which looks similar to https://github.com/python/mypy/issues/9319. It is weird that here it worked with mypy versions < 1.5, though.

I don't know if there is a better fix, but I thought that redefining __eq__ in Dataset would be a bit less dirty workaround than adding type: ignore in the class declaration.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8142/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1879652439 PR_kwDOAMm_X85Zc4ub 8141 Fix doctests: pandas 2.1 MultiIndex repr with nan benbovy 4160723 closed 0     0 2023-09-04T07:08:55Z 2023-09-05T08:35:37Z 2023-09-05T08:35:36Z MEMBER   0 pydata/xarray/pulls/8141  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8141/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1862912829 PR_kwDOAMm_X85Yk15B 8102 Add `Coordinates.assign()` method benbovy 4160723 closed 0     0 2023-08-23T09:15:51Z 2023-09-01T13:28:16Z 2023-09-01T13:28:16Z MEMBER   0 pydata/xarray/pulls/8102
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

This is consistent with the Dataset and DataArray assign methods (now that Coordinates is also exposed as public API).

This allows writing:

```python midx = pd.MultiIndex.from_arrays([["a", "a", "b", "b"], [0, 1, 0, 1]]) midx_coords = xr.Coordinates.from_pandas_multiindex(midx, "x")

ds = xr.Dataset(coords=midx_coords.assign(y=[1, 2])) ```

which is quite common (at least in the tests) and a bit nicer than

python ds = xr.Dataset(coords=midx_coords.merge({"y": [1, 2]}).coords)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8102/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
180638999 MDExOlB1bGxSZXF1ZXN0ODc3MTUzMDM= 1028 Add `set_index`, `reset_index` and `reorder_levels` methods benbovy 4160723 closed 0     8 2016-10-03T13:22:24Z 2023-08-30T09:28:26Z 2016-12-27T17:03:00Z MEMBER   0 pydata/xarray/pulls/1028

Another item in #719.

I added tests and updated the docs, so this is ready for review.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1028/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1864650372 PR_kwDOAMm_X85YqtUk 8109 Better error message when trying to set an index from a scalar coordinate benbovy 4160723 closed 0     0 2023-08-24T08:18:13Z 2023-08-30T09:27:27Z 2023-08-30T07:13:15Z MEMBER   0 pydata/xarray/pulls/8109
  • [x] Closes #4091
  • [x] Tests added

The message suggests using .expand_dims().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8109/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
966983801 MDExOlB1bGxSZXF1ZXN0NzA5MTg3NDY2 5692 Explicit indexes benbovy 4160723 closed 0     46 2021-08-11T15:57:41Z 2023-08-30T09:26:37Z 2022-03-17T17:11:44Z MEMBER   0 pydata/xarray/pulls/5692
  • [x] Closes many issues:
  • [x] closes #1366
  • [x] closes #1408
  • [x] closes #2489
  • [x] closes #3432
  • [x] closes #4542
  • [x] closes #4955
  • [x] closes #5202
  • [x] closes #5645
  • [x] closes #5691
  • [x] closes #5697
  • [x] closes #5700
  • [x] closes #5727
  • [x] closes #5953
  • [x] closes #6183
  • [x] closes #6313
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst (new Index and Indexes API not public yet)

Follow-up on #5636 (work in progress), supersedes #2195.

This is likely to be going big, sorry in advance! It'll be safer to make a release before merging this PR.

Current progress:

  • [x] create (default) indexes using the Index classes
  • [x] refactor default indexes created when 1st accessing .xindexes or .indexes
  • [x] support for non-default indexes (no public API yet)
  • [x] remove multi-index virtual coordinates (replace it by regular coordinates)
  • [x] refactor internal (text / html) formatting functions
  • [x] internal refactor of location-based selection (.isel())
  • [x] internal refactor of label-based selection (.sel())
  • [x] internal refactor of .rename()
  • Some changes in behavior (see comments below)
    • see #4108
    • see #4107
    • see #4417
  • [x] internal refactor of set_index / reset_index
  • [x] internal refactor of stack / unstack
    • Some changes in behavior (see comments below)
  • [x] internal refactor of Dataset.to_stacked_array
  • [x] internal refactor of swap_dims
  • [x] internal refactor of expand_dims
  • [x] internal refactor of alignment
  • [x] internal refactor of reindex and reindex_like
  • [x] internal refactor of interp and interp_like
  • [x] internal refactor of merge
  • [x] internal refactor of concat
  • [x] internal refactor of computation
  • [x] internal refactor of copy
  • [x] internal refactor of update, assign, __setitem__, del, drop_vars, etc.
    • updates must not corrupt multi-coordinate indexes
  • [x] internal refactor of set_coords and reset_coords
  • internal refactor of drop_sel and drop_isel (maybe later)
  • [x] internal refactor of pad
  • [x] internal refactor of shift
  • [x] internal refactor of roll

TODO:

  • [x] Uniformize Index API with Xarray's API
    • [x] rename Index.query() -> Index.sel()?
    • [x] rename PandasMultiIndex.from_product() -> PandasMultiIndex.stack()? Add Index.stack() and Index.unstack().
    • [x] remove Index.union() and Index.intersection()
  • [x] Use Index.create_variables() internally
    • [x] remove PandasIndex.from_pandas_index() and PandasMultiIndex.from_pandas_index() (use constructor + .create_variables() instead)
  • [x] Review where .xindexes is used and use private API instead (._indexes) if possible for speed
    • [x] requires that _indexes always returns a mapping
  • [x] Use from __future__ import annotations in indexes.py
  • [x] Re-activate default indexes invariant check (with opt-out for some tests)

In next PRs:

  • custom Index.__repr__ and Index._repr_inline_
  • add an Indexes section in DataArray / Dataset reprs
  • update public API (set_index, reset_index, drop_indexes, Dataset and DataArray constructors, etc.)
  • allow multi-dimensional variables with name in var.dims
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5692/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
953235338 MDExOlB1bGxSZXF1ZXN0Njk3MzA3NDc3 5636 Refactor index vs. coordinate variable(s) benbovy 4160723 closed 0     4 2021-07-26T19:54:25Z 2023-08-30T09:21:55Z 2021-08-09T07:56:56Z MEMBER   0 pydata/xarray/pulls/5636
  • [x] Closes #5553
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

This implements option 3 (sort of) described in https://github.com/pydata/xarray/issues/5553#issue-933551030:

  • the goal is to avoid wrapping an xarray.Index into an xarray.Variable and keep those two concepts distinct from each other.
  • the xarray.Index.from_variables class constructor accepts a dictionary of xarray.Variable objects as argument and may (or should?) also return corresponding xarray.IndexVariable objects to ensure immutability.
  • for PandasIndex, the new returned xarray.IndexVariable wraps the underlying pd.Index via a PandasIndexingAdapter (this reverts some changes made in #5102).
  • for PandasMultiIndex, this PR adds PandasMultiIndexingAdapter so that we can wrap the pandas multi-index in separate coordinate variables objects: one for the dimension + one for each level. The level coordinates data internally hold a reference to the dimension coordinate data to avoid indexing the same underlying pd.MultiIndex for each of those coordinates (PandasMultiIndexingAdapter.__getitem__ is memoized for that purpose).

This is very much work in progress, I need to update (or revert) all related parts of Xarray's internals, update tests, etc. At this stage any comment on the approach described above is welcome.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5636/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1485037066 PR_kwDOAMm_X85Ez9Gj 7368 Expose "Coordinates" as part of Xarray's public API benbovy 4160723 closed 0     31 2022-12-08T16:59:29Z 2023-08-30T09:11:57Z 2023-07-21T20:40:03Z MEMBER   0 pydata/xarray/pulls/7368
  • [x] Closes #7214
  • [x] Closes #6392
  • [x] xref #6633
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799:

  • No indexes argument is added to Dataset.__init__, and the indexes argument of DataArray.__init__ is kept private (i.e., valid only if fastpath=True)
  • When a Coordinates object is passed to a new Dataset or DataArray via the coords argument, both coordinate variables and indexes are copied/extracted and added to the new object
  • This PR also adds ~~an IndexedCoordinates subclass~~ Coordinates public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the Coordinates.from_pandas_multiindex() class method creates a new set of index and coordinates from an existing pd.MultiIndex.

EDIT: IndexCoordinates has been merged with Coordinates

EDIT2: it ended up as a pretty big refactor with the promotion of Coordinates has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, Coordinates objects are still "virtual" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a "stand-alone" Coordinates object created from scratch wraps a Dataset with no data variables.

Some examples of usage:

```python import pandas as pd import xarray as xr

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two"))

coords = xr.Coordinates.from_pandas_multiindex(midx, "x")

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

ds = xr.Dataset(coords=coords)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

Data variables:

empty

ds_to_be_deprecated = xr.Dataset(coords={"x": midx}) ds_to_be_deprecated.identical(ds)

True

da = xr.DataArray([1, 2, 3, 4], dims="x", coords=ds.coords)

<xarray.DataArray (x: 4)>

array([1, 2, 3, 4])

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

```

TODO:

  • [x] update assign_coords too so it has the same behavior if a Coordinates object is passed?
  • [x] How to avoid building any default index? It seems silly to add or use the indexes argument just for that purpose? ~~We could address that later.~~ Solution: wrap the coordinates dict in a Coordinates objects, e.g., ds = xr.Dataset(coords=xr.Coordinates(coords_dict)).

@shoyer, @dcherian, anyone -- what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7368/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1422543378 PR_kwDOAMm_X85BgRaG 7214 Pass indexes directly to the DataArray and Dataset constructors benbovy 4160723 closed 0     17 2022-10-25T14:16:44Z 2023-08-30T09:11:56Z 2023-07-18T11:52:11Z MEMBER   1 pydata/xarray/pulls/7214
  • [x] Closes #6392
  • [x] Closes #6633 ?
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

From https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937:

I'm thinking of only accepting one or more instances of Indexes as indexes argument in the Dataset and DataArray constructors. The only exception is when fastpath=True a mapping can be given directly. Also, when an empty collection of indexes is passed this skips the creation of default pandas indexes for dimension coordinates.

  • It is much easier to handle: just check that keys returned by Indexes.variables do no conflict with the coordinate names in the coords argument
  • It is slightly safer: it requires the user to explicitly create an Indexes object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the Indexes class itself)
  • It is more convenient: an Xarray Index may provide a factory method that returns an instance of Indexes that we just need to pass as indexes, and we could also do something like ds = xr.Dataset(indexes=other_ds.xindexes)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7214/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1863646946 PR_kwDOAMm_X85YnWau 8104 Fix merge with compat=minimal (coord names) benbovy 4160723 closed 0     0 2023-08-23T16:20:48Z 2023-08-30T09:11:18Z 2023-08-30T07:57:35Z MEMBER   0 pydata/xarray/pulls/8104
  • [x] Closes #7405
  • [x] Closes #7588
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8104/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1358841264 PR_kwDOAMm_X84-NgIX 6975 Add documentation on custom indexes benbovy 4160723 closed 0     9 2022-09-01T13:20:00Z 2023-08-30T09:10:34Z 2023-07-17T23:23:22Z MEMBER   0 pydata/xarray/pulls/6975

This PR documents the API of the Index base class and adds a guide for creating custom indexes (reworked from https://hackmd.io/Zxw_zCa7Rbynx_iJu6Y3LA). Hopefully it will help anyone experimenting with this feature.

@pydata/xarray your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation.

More specifically, I'm open to any suggestion on how to better illustrate this with clear and succinct examples.

There are other parts of the documentation that still need to be updated regarding the indexes refactor (e.g., "dimension" coordinates, xindexes property, set/drop indexes, etc.). But I suggest to do that in separate PRs and focus here on creating custom indexes.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6975/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1859437888 PR_kwDOAMm_X85YY-II 8094 Refactor update coordinates to better handle multi-coordinate indexes benbovy 4160723 closed 0     4 2023-08-21T13:57:38Z 2023-08-30T09:06:28Z 2023-08-29T14:23:29Z MEMBER   0 pydata/xarray/pulls/8094
  • [x] Closes #7563
  • [x] Closes #8039
  • [x] Closes #8056
  • [x] Closes #7885
  • [x] Closes #7921
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

This refactor should better handle multi-coordinate indexes when updating (or assigning) new coordinates.

It also fixes, better isolates and better warns a bunch of deprecated pandas multi-index special cases (i.e., directly passing pd.MultiIndex objects or updating a multi-index dimension coordinate). I very much look forward to seeing support for those cases dropped :).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8094/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1498386428 PR_kwDOAMm_X85FiyaY 7382 Some alignment optimizations benbovy 4160723 closed 0     4 2022-12-15T12:54:56Z 2023-08-30T09:05:24Z 2023-01-05T21:25:55Z MEMBER   0 pydata/xarray/pulls/7382
  • [x] Benchmark added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

May fix some performance regressions, e.g., see https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233.

@ravwojdyla with this PR ds.assign(foo=~ds["d3"]) in your example should be much faster (on par with version 2022.3.0).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7382/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1362148668 PR_kwDOAMm_X84-YVgW 6992 Review (re)set_index benbovy 4160723 closed 0     1 2022-09-05T15:07:43Z 2023-08-30T09:05:10Z 2022-09-27T10:35:38Z MEMBER   0 pydata/xarray/pulls/6992
  • [x] Closes
  • [x] fixes #6946
  • [x] fixes #6989
  • [x] fixes #6959
  • [x] fixes #6969
  • [x] fixes #7036
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Restore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes).

TODO:

  • [x] review set_index
  • [x] review reset_index

For reset_index, the only behavior that is not restored here is the coordinate renamed with a _ suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecate reset_index completely in favor of drop_indexes #6971).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6992/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
979316661 MDU6SXNzdWU5NzkzMTY2NjE= 5738 Flexible indexes: how to handle possible dimension vs. coordinate name conflicts? benbovy 4160723 closed 0     4 2021-08-25T15:31:39Z 2023-08-23T13:28:41Z 2023-08-23T13:28:40Z MEMBER      

Another thing that I've noticed while working on #5692.

Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with sel or unstack). See #2299.

I'm wondering how we should handle this in the context of flexible / custom indexes:

A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in sel or stack?

B. Introduce some tag in xarray.Index so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming)

C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly?

D. Eventually revert #2353 and let users taking care of potential conflicts.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5738/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1175329407 I_kwDOAMm_X85GDhp_ 6392 Pass indexes to the Dataset and DataArray constructors benbovy 4160723 closed 0     6 2022-03-21T12:41:51Z 2023-07-21T20:40:05Z 2023-07-21T20:40:04Z MEMBER      

Is your feature request related to a problem?

This is part of #6293 (explicit indexes next steps).

Describe the solution you'd like

A Mapping[Hashable, Index] would probably be the most obvious (optional) value type accepted for the indexes argument of the Dataset and DataArray constructors.

pros:

  • consistent with the xindexes property

cons:

  • need to be careful with what is passed as coords and indexes
  • multi-indexes: redundancy and order matters (e.g., pandas multi-index levels)

An example with a pandas multi-index

Currently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then "unpacked" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index:

```python import pandas as pd import xarray as xr

pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = xr.PandasMultiIndex(pd_idx, "x")

indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables()

ds = xr.Dataset(coords=coords, indexes=indexes) ```

The cases below should raise an error:

```python ds = xr.Dataset(indexes=indexes)

ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'

ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx}, )

ValueError: missing index(es) for coordinate(s): 'bar'

ds = xr.Dataset( coords={"x": coords["x"], "foo": [0, 1, 2, 3], "bar": coords["bar"]}, indexes=indexes, )

ValueError: conflict between coordinate(s) and index(es): 'foo'

ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx, "bar": xr.PandasIndex([0, 1, 2], "y")}, )

ValueError: conflict between coordinate(s) and index(es): 'bar'

```

Should we raise an error or simply ignore the index in the case below?

```python ds = xr.Dataset(coords=coords)

ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'

or

create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index

```

Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order.

```python ds = xr.Dataset(coords=coords, indexes={"bar": idx, "x": idx, "foo": idx}) list(ds.xindexes.keys())

["x", "foo", "bar"]

```

How to generalize to any (custom) index?

With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent pd_idx.names vs. coordinate names and because idx.get_variables() returns Xarray IndexVariable objects where variable data wraps the pandas multi-index.

However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from .get_variables() as they don't support wrapping internal data as coordinate data. Right now there's nothing in the Xarray Index base class that could help checking consistency between indexes vs. coordinates for any kind of index.

How could we solve this?

  • A. add a .coords property to the Xarray Index base class, that returns a dict[Hashable, IndexVariable].

    • Ambiguous when an Index is created directly, i.e., like above xr.PandasMultiIndex(pd_idx, "x"). Should .coords return None and return the coordinates returned by the last .get_variables() call?
    • What if different sets of coordinates refer to a common index (e.g., after copying the coordinate variables, etc.)?
  • B. add a .coord_names property to the Xarray Index base class that returns tuple[Hashable, ...], and add a private attribute to IndexVariable that returns the index object (or return it via a very lightweight IndexAdapter base class used to wrap variable data).

    • Index.get_variables(variables) would by default return shallow copies of the input variables with a reference to the index object.
    • If that's necessary, we could also store the coordinate dimensions in coord_names, i.e., using tuple[tuple[Hashable, tuple[Hashable, ...]], ...].

I think I prefer the second option.

Describe alternatives you've considered

Also allow passing index types (and build options) via indexes

I.e., Mapping[Hashable, Index | Type[Index] | tuple[TypeIndex, Mapping[Any, Any]]], so that new indexes can be created from the passed coordinates at DataArray or Dataset creation.

pros:

  • Flexible.

cons:

  • This is complicated. Constructing the Dataset / DataArray (with default indexes) first then calling .set_index is probably better.
  • Hard to deal with multi-index (redundancy of build option, etc.)

Pass multi-indexes once, grouped by coordinate names

I.e., indexes keys accept tuples: Mapping[Hashable | tuple[Hashable, ...], Index]

pros:

  • No redundancy and easier to check consistency between indexes vs. coordinates

cons:

  • Not consistent with the .xindexes property
  • Complicated when eventually using tuples for coordinate names?

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6392/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1307195361 PR_kwDOAMm_X847hz6o 6800 (scipy 2022 branch) Add an "options" argument to Index.from_variables() benbovy 4160723 closed 0     1 2022-07-17T20:01:00Z 2022-12-08T09:38:50Z 2022-09-02T13:54:46Z MEMBER   0 pydata/xarray/pulls/6800

It allows passing options to the constructor of a custom Index subclass, in case there's any relevant build options to expose to users. This could for example be the distance metric chosen for an index based on sklearn.neighbors.BallTree, or the CRS definition for a geospatial index.

The **options arguments of Dataset.set_xindex() are passed through.

An alternative way would be to pass options via coordinate metadata, like the spatial_ref coordinate in rioxarray. Perhaps both alternatives may co-exist?

This PR also adds type annotations to set_xindex().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6800/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1357296406 PR_kwDOAMm_X84-IR52 6971 Add set_xindex and drop_indexes methods benbovy 4160723 closed 0     7 2022-08-31T12:54:35Z 2022-12-08T09:38:13Z 2022-09-28T07:25:15Z MEMBER   0 pydata/xarray/pulls/6971
  • [x] Closes #6849
  • [x] Supersedes #6800
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

This PR adds Dataset and DataArray .set_xindex and .drop_indexes methods (the latter is also discussed in #4366). I've cherry picked the relevant commits in the scipy22 branch and added a few more commits. This PR also allows passing build options to any Index.

Some comments and open questions:

  • Should we make the index_cls argument of set_xindex optional?
  • I.e., set_index(coord_names, index_cls=None, **options) where a pandas index is created by default (or a pandas multi-index if several coordinate names are given), provided that the coordinate(s) are valid 1-d candidates.
  • This would be redundant with the existing set_index method, but this would be convenient if we later depreciate it.

  • Should we depreciate set_index and reset_index? I think we should, but probably not at this point yet.

  • There's a special case for multi-indexes where set_xindex(["foo", "bar"], PandasMultiIndex) adds a dimension coordinate in addition to the "foo" and "bar" level coordinates so that it is consistent with the rest of Xarray. I find it a bit annoying, though. Probably another motivation for depreciating this dimension coordinate.

  • In this PR I also imported the Index base class in Xarray's root namespace.

  • It is needed for custom indexes and it's just a little more convenient than importing it from xarray.core.indexes.
  • Should we do the same for PandasIndex and PandasMultiIndex subclasses? Maybe if one wants to create a custom index inheriting from it. PandasMultiIndex factory methods could be also useful if we depreciate passing pd.MultiIndex objects as DataArray / Dataset coordinates.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6971/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1363524666 PR_kwDOAMm_X84-c82D 6999 Raise UserWarning when rename creates a new dimension coord benbovy 4160723 closed 0     2 2022-09-06T16:16:17Z 2022-12-08T09:38:13Z 2022-09-27T09:33:40Z MEMBER   0 pydata/xarray/pulls/6999
  • [x] Closes #6607
  • [x] Closes #4107
  • [x] Closes #6229
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Current implemented "fix": raise a UserWarning and suggest using swap_dims (*)

Alternatively, we could:

  • revert the breaking change (i.e., create the index again) and raise a DeprecationWarning instead
  • raise an error instead of a warning

I don't have strong opinions on this, I'm happy to implement another alternative. The downside of reverting the breaking change now is that unfortunately it will introduce a breaking change in the next release., while workarounds are pretty straightforward.

(*) from https://github.com/pydata/xarray/issues/6607#issuecomment-1126587818, doing ds.set_coords(['lon']).rename(x='lon').set_index(lon='lon') is working too. With #6971, .set_xindex('lon') could work as well.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6999/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1364493817 PR_kwDOAMm_X84-gJCw 7003 Misc. fixes for Indexes with pd.Index objects benbovy 4160723 closed 0     0 2022-09-07T11:05:02Z 2022-12-08T09:36:51Z 2022-09-23T07:30:38Z MEMBER   0 pydata/xarray/pulls/7003
  • [x] Closes #6987
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7003/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1390999159 PR_kwDOAMm_X84_3QjW 7105 Fix to_index(): return multiindex level as single index benbovy 4160723 closed 0     4 2022-09-29T14:44:22Z 2022-12-08T09:36:51Z 2022-10-12T14:12:48Z MEMBER   0 pydata/xarray/pulls/7105
  • [x] Closes #6836
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7105/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1193611401 PR_kwDOAMm_X841rm9D 6443 Fix concat with scalar coordinate (wrong index type) benbovy 4160723 closed 0     1 2022-04-05T19:16:30Z 2022-12-08T09:36:50Z 2022-04-06T01:19:48Z MEMBER   0 pydata/xarray/pulls/6443
  • [x] Closes #6434
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6443/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1389632629 PR_kwDOAMm_X84_ywy1 7101 Fix Dataset.assign_coords overwriting multi-index benbovy 4160723 closed 0     0 2022-09-28T16:21:48Z 2022-12-08T09:36:50Z 2022-09-28T18:02:16Z MEMBER   0 pydata/xarray/pulls/7101
  • [x] Closes #7097
  • [x] Tests added

@dcherian the DeprecationWarning was ignored by default for .assign_coords() because of https://github.com/pydata/xarray/pull/6798#discussion_r924653224. I changed it to FutureWarning so that it is shown for both .assign() and .assign_coords().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7101/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1324225268 PR_kwDOAMm_X848a7mk 6857 Fix aligned index variable metadata side effect benbovy 4160723 closed 0     0 2022-08-01T10:57:16Z 2022-12-08T09:36:49Z 2022-08-31T07:16:14Z MEMBER   0 pydata/xarray/pulls/6857
  • [x] Closes #6852
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6857/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1472483025 PR_kwDOAMm_X85EHyv7 7347 Fix assign_coords resetting all dimension coords to default index benbovy 4160723 closed 0     3 2022-12-02T08:19:01Z 2022-12-08T09:36:49Z 2022-12-02T16:32:40Z MEMBER   0 pydata/xarray/pulls/7347
  • [x] Closes #7346
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7347/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1472470718 I_kwDOAMm_X85XxB6- 7346 assign_coords reset all dimension coords to default (pandas) index benbovy 4160723 closed 0     0 2022-12-02T08:07:55Z 2022-12-02T16:32:41Z 2022-12-02T16:32:41Z MEMBER      

What happened?

See https://github.com/martinfleis/xvec/issues/13#issue-1472023524

What did you expect to happen?

assign_coords() should preserve the index of coordinates that are not updated or not part of a dropped multi-coordinate index.

Minimal Complete Verifiable Example

See https://github.com/martinfleis/xvec/issues/13#issue-1472023524

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Xarray version 2022.11.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7346/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1322198907 I_kwDOAMm_X85Ozyd7 6849 Public API for setting new indexes: add a set_xindex method? benbovy 4160723 closed 0     5 2022-07-29T12:38:34Z 2022-09-28T07:25:16Z 2022-09-28T07:25:16Z MEMBER      

What is your issue?

xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section).

The scipy22 branch contains the addition of a .set_xindex() method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside .set_index() (at least for a while).

  • Having two methods .set_xindex() vs. .set_index() would be quite consistent with the .xindexes vs. .indexes properties that are already there.

  • I actually like the .set_xindex() API proposed in the scipy22, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it could be possible to support both that and .set_index()'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation.

  • In the long term we could progressively get rid of .indexes and .set_index() and/or rename .xindexes to .indexes and .set_xindex() to .set_index().

Thoughts @pydata/xarray?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6849/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1361896826 I_kwDOAMm_X85RLOV6 6989 reset multi-index to single index (level): coordinate not renamed benbovy 4160723 closed 0 benbovy 4160723   0 2022-09-05T12:45:22Z 2022-09-27T10:35:39Z 2022-09-27T10:35:39Z MEMBER      

What happened?

Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name.

What did you expect to happen?

While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate reset_index in favor of drop_indexes (#6971).

Minimal Complete Verifiable Example

```Python import pandas as pd import xarray as xr

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar"))

ds = xr.Dataset(coords={"x": midx})

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* foo (x) object 'a' 'a' 'b' 'b'

* bar (x) int64 1 2 1 2

Data variables:

empty

rds = ds.reset_index("foo")

v2022.03.0

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) int64 1 2 1 2

foo (x) object 'a' 'a' 'b' 'b'

Data variables:

empty

v2022.06.0

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

foo (x) object 'a' 'a' 'b' 'b'

* bar (x) int64 1 2 1 2

Dimensions without coordinates: x

Data variables:

empty

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6989/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1361626450 I_kwDOAMm_X85RKMVS 6987 Indexes.get_unique() TypeError with pandas indexes benbovy 4160723 closed 0 benbovy 4160723   0 2022-09-05T09:02:50Z 2022-09-23T07:30:39Z 2022-09-23T07:30:39Z MEMBER      

@benbovy I also just tested the get_unique() method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected.

Taking the above dataset ds, accessing this function results in an error:

```python

ds.indexes.get_unique()

TypeError: unhashable type: 'MultiIndex' ```

However, for xindexes it works: ```python

ds.xindexes.get_unique()

[<xarray.core.indexes.PandasMultiIndex at 0x7f105bf1df20>] ```

Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6987/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
302077805 MDU6SXNzdWUzMDIwNzc4MDU= 1961 Extend xarray with custom "coordinate wrappers" benbovy 4160723 closed 0     10 2018-03-04T11:26:15Z 2022-09-19T08:47:45Z 2022-09-19T08:47:44Z MEMBER      

Recent and ongoing developments in xarray turn DataArray and Dataset more and more into data wrappers that are extensible at (almost) every level:

  • domain-specific methods (accessors)
  • io backends (netcdf, raster, zarr, etc.) via an abstract DataStore interface
  • array backends (numpy, dask, sparse) via multidispatch or hooks (#1938)
  • soon custom indexes? (kd-tree, out-of-core indexes... #1603, #1650, #475)

Regarding the latter, I’m thinking about the idea of extending xarray at an even more abstract level, i.e., the possibility of adding / registering "coordinate wrappers" to DataArray or Dataset objects. Basically, it would correspond to adding any object that allows to do some operation based on one or several coordinates ~~(I haven’t found any better name than "coordinate agent" to describe that)~~.

EDIT: "coordinate agents" may not be quite right here, I changed that to "coordinate wrappers")

Indexes are a specific case of coordinate wrappers that serve the purpose of indexing. This is built in xarray.

While indexing is enough in 80% of cases, I see a couple of use cases where other coordinate wrappers (built outside of xarray) would be nice to have:

  • Grids. For example, xgcm implements operations (interp, diff) on physical axes that may each include several coordinates, depending on the position of the coordinate labels on the axis (center, left…). Other grids define their topology using a greater number of coordinates (e.g., ugrid). Storing regridding weights might be another use case?
  • Clocks. For example, xarray-simlab use one or several coordinates to define the timeline of a computational simulation.

In those examples we usually rely on coordinate attributes and/or classes that encapsulate xarray objects to implement the specific features that we need. While it works, it has limitations and I think it can be improved.

Custom coordinate wrappers would be a way of extending xarray that is very consistent with other current (or considered) extension mechanisms.

This is still a very vague idea and I’m sure that there are lots of details that can be discussed (serialization, etc.).

But before going further, I’d like to know your thoughts @pydata/xarray. Do you think it is a silly idea? Do you have in mind other use cases where custom coordinate wrappers would be useful?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1961/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
955936490 MDU6SXNzdWU5NTU5MzY0OTA= 5647 Flexible indexes: review the implementation of alignment and merge benbovy 4160723 closed 0     12 2021-07-29T15:03:23Z 2022-09-07T09:47:13Z 2022-09-07T09:47:13Z MEMBER      

The current implementation of the align function is problematic in the context of flexible indexes because:

  • the sizes of the joined indexes are reused for checking compatibility with unlabelled dimension sizes
  • the joined indexes are used as indexers to compute the aligned Dataset / DataArray.

This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes.

I'm opening this issue to gather ideas on how best to handle alignment in a more flexible way (I haven't been thinking much at this problem yet).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5647/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1322190255 I_kwDOAMm_X85OzwWv 6848 Update API benbovy 4160723 closed 0     0 2022-07-29T12:30:08Z 2022-07-29T12:30:23Z 2022-07-29T12:30:23Z MEMBER        
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6848/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1176745736 PR_kwDOAMm_X840z4zt 6400 Speed-up multi-index html repr + add display_values_threshold option benbovy 4160723 closed 0     3 2022-03-22T12:57:37Z 2022-03-29T07:10:22Z 2022-03-29T07:05:32Z MEMBER   0 pydata/xarray/pulls/6400

This adds PandasMultiIndexingAdapter._repr_html_ that can greatly speed-up the html repr of Xarray objects with multi-indexes.

This optimized _repr_html_ implementation is now used for formatting the array detailed view of all multi-index coordinates in the html repr, instead of converting the full index and each levels to numpy arrays before formatting them.

```python import xarray as xr

ds = xr.tutorial.load_dataset("air_temperature") da = ds["air"].stack(z=[...])

da.shape

(3869000,)

%timeit -n 1 -r 1 da.repr_html()

9.96 ms !

```

  • [x] Closes #5529
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6400/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1174675456 PR_kwDOAMm_X840tJ9A 6388 isel: convert IndexVariable to Variable if index is dropped benbovy 4160723 closed 0     1 2022-03-20T20:29:58Z 2022-03-29T07:10:08Z 2022-03-21T04:47:48Z MEMBER   0 pydata/xarray/pulls/6388
  • [x] Closes #6381
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6388/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
616432851 MDExOlB1bGxSZXF1ZXN0NDE2NTQ0MzE4 4053 Fix html repr in untrusted notebooks (plain text fallback) benbovy 4160723 closed 0     5 2020-05-12T07:38:22Z 2022-03-29T07:10:07Z 2020-05-20T17:06:40Z MEMBER   0 pydata/xarray/pulls/4053
  • [x] Closes #4041
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

This is not very elegant (actually plain text repr is already included in the notebook as text/plain mime type but it is ignored when text/html mime type is present), but it seems to work. I haven't found a better workaround.

I don't really know if this can be properly tested (I only added a basic test).

Steps to test this fix:

  • To "untrust" a notebook: open an existing notebook with a simple editor, manually edit one output cell with a xarray object repr, and save the ipynb file.
  • Open this notebook with the Notebook app, you should see the plain text repr.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4053/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
849315490 MDExOlB1bGxSZXF1ZXN0NjA4MTEwNjI0 5102 Flexible indexes: add Index base class and xindexes properties benbovy 4160723 closed 0     10 2021-04-02T16:18:07Z 2022-03-29T07:10:07Z 2021-05-11T08:21:26Z MEMBER   0 pydata/xarray/pulls/5102

This PR clears up the path for flexible indexes:

  • it adds a new ~~IndexAdapter~~ Index base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party)
  • PandasIndexAdapter now inherits from ~~IndexAdapter~~ Index
  • the xarray_obj.xindexes properties return Index (PandasIndexAdapter) instances. xarray_obj.indexes properties still return pandas.Index instances.

~~The latter is a breaking change, although I'm not sure if the indexes property has been made public yet.~~

This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now).

There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the pandas.Index objects, but I'd expect that these will be cleaned-up further in the refactoring.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5102/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
893415955 MDExOlB1bGxSZXF1ZXN0NjQ1OTMzODI3 5322 Internal refactor of label-based data selection benbovy 4160723 closed 0     1 2021-05-17T14:52:49Z 2022-03-29T07:10:07Z 2021-06-08T09:35:54Z MEMBER   0 pydata/xarray/pulls/5322

Xarray label-based data selection now relies on a newly added xarray.Index.query(self, labels: Dict[Hashable, Any]) -> Tuple[Any, Optional[None, Index]] method where:

  • labels is a always a dictionary with coordinate name(s) as key(s) and the corresponding selection label(s) as values
  • When calling .sel with some coordinate(s)/label(s) pairs, those are first grouped by index so that only the relevant pairs are passed to an Index.query
  • the returned tuple contains the positional indexers and (optionally) a new index object

For a simple pd.Index, labels always corresponds to a 1-item dictionary like {'coord_name': label_values}, which is not very useful in this case, but this format is useful for pd.MultiIndex and will likely be for other, custom indexes.

Moving the label->positional indexer conversion logic into PandasIndex.query(), I've tried to separate pd.Index vs pd.MultiIndex concerns by adding a new PandasMultiIndex wrapper class (it will probably be useful for other things as well) and refactor the complex logic that was implemented in convert_label_indexer. Hopefully it is a bit clearer now.

Working towards a more flexible/generic system, we still need to figure out how to:

  • pass index query extra arguments like method and tolerance for pd.Index but in a more generic way
  • handle several positional indexers over multiple dimensions possibly returned by a custom "meta-index" (e.g., staggered grid index)
  • handle the case of positional indexers returned from querying >1 indexes along the same dimension (e.g., multiple coordinates along x with a simple pd.Index)
  • pandas indexes don't need information like the names or shapes of their corresponding coordinate(s) to perform label-based selection, but this kind of information will probably be needed for other indexes (we actually need it for advanced point-wise selection using tree-based indexes in xoak).

This could be done in follow-up PRs..

Side note: I've initially tried to return from xindexes items for multi-index levels as well (not only index dimensions), but it's probably wiser to save this for later (when we'll tackle the multi-index virtual coordinate refactoring) as there are many places in Xarray where this is clearly not expected.

Happy to hear your thoughts @pydata/xarray.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5322/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
819062172 MDExOlB1bGxSZXF1ZXN0NTgyMjI0MTQ4 4979 Flexible indexes refactoring notes benbovy 4160723 closed 0     22 2021-03-01T16:57:32Z 2022-03-29T07:09:31Z 2021-03-17T16:47:29Z MEMBER   0 pydata/xarray/pulls/4979

As a preliminary step before I take on the refactoring and implementation of flexible indexes in Xarray for the next few months, I reviewed the status of https://github.com/pydata/xarray/projects/1 and started compiling partially implemented or planned changes, thoughts, etc. into a single document that may serve as a basis for further discussion and implementation work.

It's still very much work in progress (I will update it regularly in the forthcoming days) and it is very open to discussion (we can use this PR for that)!

I'm not sure if Xarray's root folder is a good place for this document, though. We could move this into a new repository in xarray-contrib (that could also host other enhancement proposals) if that's necessary.

I'm looking forward to getting started on this and to getting your thoughts/feedback!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4979/reactions",
    "total_count": 13,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 7,
    "confused": 0,
    "heart": 3,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
903899735 MDExOlB1bGxSZXF1ZXN0NjU1MTA5NDg0 5385 Cast PandasIndex to pd.(Multi)Index benbovy 4160723 closed 0     0 2021-05-27T15:15:41Z 2022-03-29T07:09:31Z 2021-05-28T08:28:11Z MEMBER   0 pydata/xarray/pulls/5385
  • [x] Closes #5384
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5385/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1174687047 PR_kwDOAMm_X840tLrz 6389 Re-index: fix missing variable metadata benbovy 4160723 closed 0     2 2022-03-20T21:11:38Z 2022-03-29T07:09:31Z 2022-03-21T07:53:05Z MEMBER   0 pydata/xarray/pulls/6389
  • [x] Closes #6382
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6389/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1174610081 PR_kwDOAMm_X840s_xU 6385 Fix concat with scalar coordinate benbovy 4160723 closed 0     0 2022-03-20T16:46:48Z 2022-03-29T07:09:30Z 2022-03-21T04:49:23Z MEMBER   0 pydata/xarray/pulls/6385
  • [x] Closes #6384
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6385/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1174615799 PR_kwDOAMm_X840tAtL 6386 Fix Dataset groupby returning a DataArray benbovy 4160723 closed 0     0 2022-03-20T17:06:13Z 2022-03-29T07:09:30Z 2022-03-20T18:55:27Z MEMBER   0 pydata/xarray/pulls/6386
  • [x] Closes #6379
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6386/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1175490214 PR_kwDOAMm_X840vt1_ 6394 Fix DataArray groupby returning a Dataset benbovy 4160723 closed 0     0 2022-03-21T14:43:21Z 2022-03-29T07:09:30Z 2022-03-21T15:26:20Z MEMBER   0 pydata/xarray/pulls/6394
  • [x] Closes #6393
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6394/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1174622308 PR_kwDOAMm_X840tBvD 6387 Fix concat with variable or dataarray as dim (propagate attrs) benbovy 4160723 closed 0     1 2022-03-20T17:27:41Z 2022-03-29T07:09:29Z 2022-03-20T18:53:46Z MEMBER   0 pydata/xarray/pulls/6387
  • [x] Closes #6380
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6387/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1183360119 PR_kwDOAMm_X841JuRv 6418 Fix concat with scalar coordinate (dtype) benbovy 4160723 closed 0     0 2022-03-28T12:22:50Z 2022-03-29T07:06:46Z 2022-03-28T16:05:01Z MEMBER   0 pydata/xarray/pulls/6418
  • [x] Closes #6416
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6418/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
968796847 MDU6SXNzdWU5Njg3OTY4NDc= 5697 Coerce the labels passed to Index.query to array-like objects benbovy 4160723 closed 0     3 2021-08-12T13:09:40Z 2022-03-17T17:11:43Z 2022-03-17T17:11:43Z MEMBER      

When looking at #5691 I noticed that the labels are sometimes coerced to arrays (i.e., #3153) but not always.

Later in PandasIndex.query those may again be coerced to arrays (i.e., _as_array_tuplesafe). In #5692 (https://github.com/pydata/xarray/pull/5692/commits/a551c7f05abf90a492fb59068b59ebb2bac8cb4c) they are always coerced to arrays before maybe be converted as scalars.

Shouldn't we therefore make things easier and ensure that the labels given to xarray.Index.query() always have an array interface? This would also yield a more predictable behavior to anyone who wants to implement custom xarray indexes.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5697/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
968990058 MDU6SXNzdWU5Njg5OTAwNTg= 5700 Selection with multi-index and float32 values benbovy 4160723 closed 0     0 2021-08-12T14:55:11Z 2022-03-17T17:11:43Z 2022-03-17T17:11:43Z MEMBER      

I guess it's rather an edge case, but a similar issue than the one fixed in #3153 may occur with multi-indexes:

```python

foo_data = ['a', 'a', 'b', 'b'] bar_data = np.array([0.1, 0.2, 0.7, 0.9], dtype=np.float32) da = xr.DataArray([1, 2, 3, 4], dims="x", coords={"foo": ("x", foo_data), "bar": ("x", bar_data)}) da = da.set_index(x=["foo", "bar"]) ```

```python

da.sel(bar=0.1) KeyError: 0.1 ```

```python

da.sel(bar=np.array(0.1, dtype=np.float32).item()) <xarray.DataArray (foo: 1)> array([1]) Coordinates: * foo (foo) object 'a' ```

(xarray version: 0.18.2 as there's a regression introduced in 0.19.0 #5691)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5700/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
955605233 MDU6SXNzdWU5NTU2MDUyMzM= 5645 Flexible indexes: handle renaming coordinate variables benbovy 4160723 closed 0     0 2021-07-29T08:42:00Z 2022-03-17T17:11:42Z 2022-03-17T17:11:42Z MEMBER      

We should have some API in xarray.Index to update the index when its corresponding coordinate variables are renamed.

This currently implemented here where the underlying pd.Index name(s) are updated: https://github.com/pydata/xarray/blob/c5530d52d1bcbd071f4a22d471b728a4845ea36f/xarray/core/dataset.py#L3299-L3314

This logic should be moved into PandasIndex and PandasMultiIndex.

Other, custom indexes might also have internal attributes to update, so we might need formal API for that.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5645/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
985162305 MDU6SXNzdWU5ODUxNjIzMDU= 5755 Mypy errors with the last version of _typed_ops.pyi benbovy 4160723 closed 0     5 2021-09-01T13:34:52Z 2021-09-13T10:53:16Z 2021-09-13T00:04:54Z MEMBER      

What happened:

Since #5569 I get a lot of mypy errors from _typed_ops.pyi (see below). What's weird is that it is not happening in all cases:

$ mypy # ok $ mypy . # errors $ pre-commit run --all-files # ok $ pre-commit run # errors $ git commit # (via pre-commit hooks) errors

I also tried pre-commit clean with no luck. EDIT: I also tried on a freshly cloned xarray repository.

@max-sixty @Illviljan Any idea on what's happening?

What you expected to happen:

No mypy error in all cases.

Anything else we need to know?:

xarray/core/_typed_ops.pyi:32: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:33: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:34: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:35: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:36: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:37: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:38: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:39: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:40: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:41: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:42: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:43: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:44: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:45: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:46: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:47: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:48: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:49: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:50: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:51: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:52: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:53: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:54: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:55: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:56: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:57: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:60: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:61: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:62: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:63: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:64: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:65: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:66: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:67: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:77: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:83: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:89: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:95: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:101: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:107: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:113: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:119: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:125: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:131: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:137: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:143: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:149: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:155: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:161: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:167: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:173: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:179: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:185: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:191: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:197: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:203: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:209: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:215: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:221: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:227: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:230: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:231: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:232: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:233: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:234: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:235: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:236: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:237: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:247: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:253: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:259: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:265: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:271: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:277: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:283: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:289: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:295: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:301: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:307: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:313: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:319: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:325: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:331: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:337: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:343: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:349: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:355: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:361: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:367: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:373: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:379: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:385: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:391: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:397: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:400: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:401: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:402: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:403: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:404: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:405: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:406: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:407: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc]

Environment:

mypy 0.910 python 3.9.6 (also tested with 3.8)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5755/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
933551030 MDU6SXNzdWU5MzM1NTEwMzA= 5553 Flexible indexes: how best to implement the new data model? benbovy 4160723 closed 0     2 2021-06-30T10:38:13Z 2021-08-09T07:56:56Z 2021-08-09T07:56:56Z MEMBER      

Yesterday during the flexible indexes weekly meeting we have discussed with @shoyer and @jhamman on what would be the best approach to implement the new data model described here. In this issue I summarize the implementation of the current data model as well as some suggestions for the new data model along with their pros / cons (I might still be missing important ones!). I don't think there's an easy or ideal solution unfortunately, so @pydata/xarray any feedback would be very welcome!

Current data model implementation

Currently any (pandas) index is wrapped into an IndexVariable object through an intermediate adapter to preserve dtypes and handle explicit indexing. This allows directly reusing the index data as a xarray coordinate variable. For a pandas multi-index, virtual coordinates are created for each level from the IndexVariable object wrapping the index. Although relying on "virtual coordinates" more or less worked so far, it is over-complicated. Moreover, this wouldn't work with the new data model where an index may be built from a set of coordinates with different dimensions.

Proposed alternatives

Option 1: independent (coordinate) variables and indexes

Indexes and coordinates are loosely coupled, i.e., a xarray.Index holds a reference (mapping) to the coordinate variable(s) from which it is built but both manage their own data independently of each other.

Pros:

  • separation of concerns.
  • we don't need anymore those complicated adapters for reusing the index data as xarray (virtual) variable(s), which may simplify some xarray internals.
  • if we drop an index, that's simple, we just drop it and all its related coordinate variables are left as-is.
  • we could theoretically build a (pandas) index from a chunked coordinate, and then when we drop the index we still have this chunked coordinate left untouched.

Cons:

  • data duplication
  • this would clearly be a regression when using pandas indexes, but maybe less so for other indexes like kd-trees where adapting those objects for using it like coordinate variables wouldn't be easy or even possible.
  • what if we want to build a DataArray or Dataset from one or more existing indexes (pandas or other)? Passing an index and treating as an array then re-building an index from this array is not optimal.
  • keeping an index and its corresponding coordinate variable(s) in a consistent, in-sync state may be tricky, given that those variables may be mutable (although we could prevent this by encapsulating those variables using a very lightweight wrapper inspired by IndexVariable).

Option 2: indexes hold coordinate variables

This is the opposite approach of the current one. Here, a xarray.Index would wrap one or more xarray.Variable objects.

Pros:

  • probably easier to keep an index and its corresponding coordinate variable(s) in-sync.
  • sharing data between an index and its coordinate variables may be easier.

Cons:

  • accessing / iterating through all coordinate variables in a DataArray or Dataset may be less straightforward.
  • when the index is dropped, we might need some logic / API to return the coordinates as new xarray.Variable objects with their own data (or should we simply always drop the corresponding coordinates too? maybe not...).
  • more responsibility / work for developers who want to provide 3rd party xarray indexes.

Option 3: intermediate solution

When an index is set (or unset), it returns a new set of coordinate variables to replace the existing ones.

Pros:

  • it keeps some separation of concerns, while it allows data sharing through adapters and/or ensures that variables are immutable using lightweight wrappers.

Cons:

  • like option 2, more things to care of for 3rd party xarray index developers.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5553/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
187859705 MDU6SXNzdWUxODc4NTk3MDU= 1092 Dataset groups benbovy 4160723 closed 0     20 2016-11-07T23:28:36Z 2021-07-02T19:56:50Z 2021-07-02T19:56:49Z MEMBER      

EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion


Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Dataset data variables, coordinates and attributes via groups.

Currently xarray allows loading a specific netCDF4 group into a Dataset. Different groups can be loaded as separate Dataset objects, which may be then combined into a single, flat Dataset. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a Dataset representing data on a staggered grid might have scalar_vars and flux_vars groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.

I think about an implementation of Dataset.groups that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat Dataset. It shouldn't be required for a backend to support groups (some existing backends simply don't). It is up to each backend to eventually transpose the Dataset.groups logic to its own group logic.

Dataset.groups might return a DatasetGroups object, which quite similarly to xarray.core.coordinates.DatasetCoordinates would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another Dataset object (sub-dataset) on __getitem__. Keys of Dataset.groups should be accessible as attributes , e.g., ds.groups['scalar_vars'] == ds.scalar_vars.

Questions:

  • How to handle hierarchies of > 1 levels (i.e., groups of groups...)?
  • How to ensure that a variable / attribute in one group is not also present in another group?
  • Case of methods called from groups with inplace=True?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1092/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
512564243 MDExOlB1bGxSZXF1ZXN0MzMyNTUyNTA3 3448 Add license for the icons used in the html repr benbovy 4160723 closed 0     1 2019-10-25T14:57:20Z 2019-10-25T15:48:52Z 2019-10-25T15:40:46Z MEMBER   0 pydata/xarray/pulls/3448
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3448/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
249584098 MDExOlB1bGxSZXF1ZXN0MTM1Mjk4ODY3 1507 Detailed report for testing.assert_equal and testing.assert_identical benbovy 4160723 closed 0     18 2017-08-11T09:38:23Z 2019-10-25T15:07:39Z 2019-01-18T09:16:31Z MEMBER   0 pydata/xarray/pulls/1507
  • ~~Closes #xxxx~~
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

~~In addition to Dataset repr, the error message also shows the output of Dataset.info() for both datasets.~~

~~This may not be the most elegant solution, but it is helpful when datasets only differ by their attributes attached to coordinates or data variables (not shown in repr). I'm open to any suggestion.~~

The report shows the differences for dimensions, data values (Variable and DataArray), coordinates, data variables and attributes (the latter only for testing.assert_identical).

There is currently not much tests for xarray.testing functions, but I'm willing to add more if needed.

Not sure if it's worth a what's new entry (EDIT: added one).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1507/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
274619743 MDExOlB1bGxSZXF1ZXN0MTUzMTE4MjQ3 1723 Fix unexpected behavior of .set_index() since pandas 0.21.0 benbovy 4160723 closed 0     0 2017-11-16T18:37:20Z 2019-10-25T15:07:18Z 2017-11-17T00:54:51Z MEMBER   0 pydata/xarray/pulls/1723
  • [x] Closes #1722
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master **/*py | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1723/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
287844110 MDExOlB1bGxSZXF1ZXN0MTYyNDI2NzU2 1820 WIP: html repr benbovy 4160723 closed 0     40 2018-01-11T16:33:07Z 2019-10-25T15:06:58Z 2019-10-24T16:48:46Z MEMBER   0 pydata/xarray/pulls/1820
  • [x] Closes #1627
  • [ ] Tests added
  • [ ] Tests passed
  • [ ] Passes git diff upstream/master **/*py | flake8 --diff
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

This is work in progress, although the basic functionality is there. You can see a preview here: http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2

TODO:

  • [ ] Add support for Multi-indexes
  • [ ] Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
  • [ ] Add some tests

Nice to have (keep this for later):

  • Clean-up CSS code and HTML template (track CSS subgrid support in browsers, this may simplify a lot the things here).
  • Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here.
  • Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS.
  • Integration of Dask arrays HTML repr (+ integration of repr for other array backends).
  • Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice).
  • Review the rules for collapsing the Coordinates, Data variables and Attributes sections (maybe expose them as global options).
  • Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long.
  • Maybe add rich representation for Dataset.coords and Dataset.data_vars as well?
Other thoughts (old) A big challenge here is to provide both robust and flexible styling (CSS): - I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good! - However, the current CSS code is a bit fragile (I had to add a lot of `!important`). Probably this could be a bit cleaned and optimized (unfortunately my CSS skills are limited). - Also, with the jupyterlab's dark theme it looks ugly. We probably need to use jupyterlab CSS variables so that our CSS scheme is compatible with the theme machinery, but at the same time we need to support other front-ends. So we probably need to maintain different stylings (i.e., multiple CSS files, one of them picked-up depending on the front-end), though I don't know if it's easy to automatically detect the front-end (choosing a default style is difficult too). - The notebook rendering on Github seems to disable style tags (no style is applied to the output, see https://gist.github.com/benbovy/3009f342fb283bd0288125a1f7883ef2). Output is not readable at all in this case, so it might be useful to allow turning off rich output as an option.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1820/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
264747372 MDU6SXNzdWUyNjQ3NDczNzI= 1627 html repr of xarray object (for the notebook) benbovy 4160723 closed 0     39 2017-10-11T21:49:20Z 2019-10-24T16:56:15Z 2019-10-24T16:48:47Z MEMBER      

Edit: preview for Dataset and DataArray (pure html/css)

Dataset: https://jsfiddle.net/tay08cn9/4/ DataArray: https://jsfiddle.net/43z4v2wt/9/


I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks.

Here are some ideas for Dataset: https://jsfiddle.net/9ab4c3tr/35/

Some notes: - The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other. - For the sake of consistency, I've stolen some style from pandas.Dataframe repr as it is shown in jupyterlab. - I tried to emphasize the most important parts of the repr, i.e., the lists of dimensions, coordinates and variables. - I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript). It already allows some interaction like hover effects and collapsible sections. However, I doubt that more fancy stuff (like, e.g., highlighting on hover a specific dimension simultaneously at several places of the repr) would be possible here without Javascript. I have limited skills in this area, though.

It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1627/reactions",
    "total_count": 11,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 4,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
234658224 MDU6SXNzdWUyMzQ2NTgyMjQ= 1447 Package naming "conventions" for xarray extensions benbovy 4160723 closed 0     5 2017-06-08T21:14:24Z 2019-06-28T22:58:33Z 2019-06-28T21:58:33Z MEMBER      

I'm wondering what would be a good name for a package that primarily aims at providing an xarray extension (in the form of a DataArray and/or Dataset accessor).

I'm currently thinking about using a prefix like the scikit package family (e.g., scikit-learn, scikit-image).

For example, for a xarray extension for signal processing we would have:

package full name: xarray-signal package import name: xrsignal (like sklearn) accessor name: signal.

```python

import xarray as xr import xrsignal ds = xr.Dataset() ds.signal.process(...) ```

The main advantage is that we directly have an idea on what the package is about. It may be also good for the overall visibility of both xarray and its 3rd-party extensions. The downside is that there is three name variations: one for getting and installing the package, another one for importing the package and again another one for using the accessor. This may be annoying especially for new users who are not accustomed to this kind of naming convention.

Conversely, choosing a different, unrelated name like salem or pangaea has the advantage of using the same name everywhere and perhaps providing multiple accessors in the same package, but given that the number of xarray extensions is likely to grow in a next future (see, e.g., the pangeo-data project) it would become difficult to have a clear view of the whole xarray package ecosystem.

Any thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1447/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
180676935 MDU6SXNzdWUxODA2NzY5MzU= 1030 Concatenate multiple variables into one variable with a multi-index (categories) benbovy 4160723 closed 0     3 2016-10-03T15:54:23Z 2019-02-25T07:25:40Z 2019-02-25T07:25:40Z MEMBER      

I often have to deal with datasets in this form (multiple variables of different sizes, each representing different categories, on the same physical dimension but using different names as they have different labels),

<xarray.Dataset> Dimensions: (wn_band1: 4, wn_band2: 6, wn_band3: 8) Coordinates: * wn_band1 (wn_band1) float64 200.0 266.7 333.3 400.0 * wn_band2 (wn_band2) float64 500.0 560.0 620.0 680.0 740.0 800.0 * wn_band3 (wn_band3) float64 1.5e+03 1.643e+03 1.786e+03 1.929e+03 ... Data variables: data_band3 (wn_band3) float64 0.7515 0.5302 0.6697 0.9621 0.01815 ... data_band1 (wn_band1) float64 0.3801 0.6649 0.01884 0.9407 data_band2 (wn_band2) float64 0.8813 0.4481 0.2353 0.9681 0.1085 0.0835

where it would be more convenient to have the data re-arranged into the following form (concatenate the variables into a single variable with a multi-index with the labels of both the categories and the physical coordinate):

<xarray.Dataset> Dimensions: (spectrum: 18) Coordinates: * spectrum (spectrum) MultiIndex - band (spectrum) int64 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 - wn (spectrum) float64 200.0 266.7 333.3 400.0 500.0 560.0 620.0 ... Data variables: data (spectrum) float64 0.3801 0.6649 0.01884 0.9407 0.8813 0.4481 ...

The latter would allow using xarray's nice features like ds.groupby('band').mean().

Currently, the best way that I've found to transform the data is something like:

``` python data = np.concatenate([ds.data_band1, ds.data_band2, ds.data_band3]) wn = np.concatenate([ds.wn_band1, ds.wn_band2, ds.wn_band3]) band = np.concatenate([np.repeat(1, 4), np.repeat(2, 6), np.repeat(3, 8)])

midx = pd.MultiIndex.from_arrays([band, wn], names=('band', 'wn')) ds2 = xr.Dataset({'data': ('spectrum', data)}, coords={'spectrum': midx}) ```

Maybe I miss a better way to do this? If I don't, it would be nice to have a convenience method for this, unless this use case is too rare to be worth it. Also not sure at all on what would be a good API such a method.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1030/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
349078381 MDExOlB1bGxSZXF1ZXN0MjA3Mjc3NDg2 2357 DOC: move xarray related projects to top-level TOC section benbovy 4160723 closed 0     1 2018-08-09T10:57:47Z 2018-08-11T13:41:24Z 2018-08-10T20:13:08Z MEMBER   0 pydata/xarray/pulls/2357

Make xarray-related projects more discoverable, as it has been suggested in xarray mailing-list.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2357/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
300588788 MDExOlB1bGxSZXF1ZXN0MTcxNjMxNTQ1 1946 DOC: add main sections to toc benbovy 4160723 closed 0     0 2018-02-27T11:13:17Z 2018-02-27T21:16:18Z 2018-02-27T19:04:24Z MEMBER   0 pydata/xarray/pulls/1946

Not a big change, but adds a little more clarity IMO.

I'm open to any suggestion for better section names and/or organization. Also I let "What's new" at the top, but not sure if "Getting started" is the right section.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1946/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
275033174 MDU6SXNzdWUyNzUwMzMxNzQ= 1727 IPython auto-completion triggers data loading benbovy 4160723 closed 0     11 2017-11-18T00:14:00Z 2017-11-18T07:09:41Z 2017-11-18T07:09:40Z MEMBER      

I create a big netcdf file like this:

```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: ds = xr.Dataset({'myvar': np.arange(100000000, dtype='float64')})

In [4]: ds.to_netcdf('test.nc')

```

Then when I open the file in a IPython console and I use auto-completion, it triggers loading the data.

```python In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('test.nc')

In [3]: ds.my # <TAB> autocompletion with any character -> triggers loading ```

I don't have that issue using the python console. Auto-completion for dictionary access in IPython (#1632) works fine too.

Output of xr.show_versions()

commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_BE.UTF-8 LOCALE: fr_BE.UTF-8 xarray: 0.10.0rc1-2-gf83361c pandas: 0.21.0 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.4 matplotlib: None cartopy: None seaborn: None setuptools: 36.6.0 pip: 9.0.1 conda: None pytest: None IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1727/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
274591962 MDU6SXNzdWUyNzQ1OTE5NjI= 1722 Change in behavior of .set_index() from pandas 0.20.3 to 0.21.0 benbovy 4160723 closed 0     1 2017-11-16T17:05:20Z 2017-11-17T00:54:51Z 2017-11-17T00:54:51Z MEMBER      

I use xarray 0.9.6 for both examples below.

With pandas 0.20.3, Dataset.set_index gives me what I expect (i.e., the grid__x data variable becomes a coordinate x):

```python In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: pd.version Out[3]: '0.20.3'

In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})

In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: empty ```

With pandas 0.21.0, it creates a MultiIndex, which is not what I expect here when setting an index with only one data variable:

```python In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: pd.version Out[3]: '0.21.0'

In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})

In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) MultiIndex - grid__x (x) int64 1 2 3 Data variables: empty ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1722/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
230631480 MDExOlB1bGxSZXF1ZXN0MTIxOTQyNjMx 1422 xarray.core.variable.as_variable part of the public API benbovy 4160723 closed 0     6 2017-05-23T08:44:08Z 2017-06-10T18:33:34Z 2017-06-02T17:55:12Z MEMBER   0 pydata/xarray/pulls/1422
  • [x] Closes #1303
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff (if we ignore messages for .rst files and "imported but not used" messages for xarray.__init__.py)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Make xarray.core.variable.as_variable part of the public API and accessible as a top-level function: xarray.as_variable.

I changed the docstrings to follow the numpydoc format more closely.

I also removed the copy=False keyword arguments as apparently it was unused.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1422/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
134359597 MDU6SXNzdWUxMzQzNTk1OTc= 767 MultiIndex and data selection benbovy 4160723 closed 0     9 2016-02-17T18:24:00Z 2016-09-14T14:28:29Z 2016-09-14T14:28:29Z MEMBER      

[Edited for more clarity]

First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719 implemented in the next releases, especially the three first ones in the list!

Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection...

To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution:

``` In [1]: import pandas as pd

In [2]: import xarray as xr

In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar'])

In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5])

In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5])

In [6]: s = pd.Series(spectrum, index=[band, wavenumber])

In [7]: s.index.names = ('band', 'wavenumber')

In [8]: da = xr.DataArray(s, dims='band_wavenumber')

In [9]: da Out[9]: <xarray.DataArray (band_wavenumber: 5)> array([ 1.70000000e-04, 1.40000000e-04, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ... ```

I extract the band 'bar' using sel:

``` In [10]: da_bar = da.sel(band_wavenumber='bar')

In [11]: da_bar Out[11]: <xarray.DataArray (band_wavenumber: 3)> array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('bar', 4100.1) ... ```

It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the MultiIndex names as arguments of the sel method, even though I don't know if it is easy to implement.

Futhermore, da_bar still has the 'band_wavenumber' dimension and the 'band' index-level, but it is not very useful anymore. Ideally, I'd rather like to obtain a DataArray object with a 'wavenumber' dimension / coordinate and the 'bar' band name dropped from the multi-index, i.e., something would require automatic index-level removal and/or automatic unstack when selecting data.

Extracting the band 'bar' from the pandas Series object gives something closer to what I need (see below), but using pandas is not an option as my spectral data involves other dimensions (e.g., time, scans, iterations...) not shown here for simplicity.

``` In [12]: s_bar = s.loc['bar']

In [13]: s_bar Out[13]: wavenumber 4100.1 0.000120 4100.3 0.000100 4100.5 0.000085 dtype: float64 ```

The problem is also that the unstacked DataArray object resulting from the selection has the same dimensions and size than the original, unstacked DataArray object. The only difference is that unselected values are replaced by nan.

``` In [13]: da.unstack('band_wavenumber') Out[13]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ 1.70000000e-04, 1.40000000e-04, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03

In [14]: da_bar.unstack('band_wavenumber') Out[14]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ nan, nan, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/767/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
169588316 MDExOlB1bGxSZXF1ZXN0ODAyMjk0OTM= 947 Multi-index levels as coordinates benbovy 4160723 closed 0     17 2016-08-05T11:34:49Z 2016-09-14T03:35:04Z 2016-09-14T03:34:51Z MEMBER   0 pydata/xarray/pulls/947

Implements 2, 4 and 5 in #719.

Demo:

``` In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

In [4]: index = pd.MultiIndex.from_product((list('ab'), range(2)), ...: names= ('level_1', 'level_2'))

In [5]: da = xr.DataArray(np.random.rand(4, 4), coords={'x': index}, ...: dims=('x', 'y'), name='test')

In [6]: da Out[6]: <xarray.DataArray 'test' (x: 4, y: 4)> array([[ 0.15036153, 0.68974802, 0.40082234, 0.94451318], [ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.3313594 , 0.93857424, 0.73023367, 0.44069622], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 * y (y) int64 0 1 2 3

In [7]: da['level_1'] Out[7]: <xarray.DataArray 'level_1' (x: 4)> array(['a', 'a', 'b', 'b'], dtype=object) Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1

In [8]: da.sel(x='a', level_2=1) Out[8]: <xarray.DataArray 'test' (y: 4)> array([ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ]) Coordinates: x object ('a', 1) * y (y) int64 0 1 2 3

In [9]: da.sel(level_2=1) Out[9]: <xarray.DataArray 'test' (level_1: 2, y: 4)> array([[ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ], [ 0.81304837, 0.81244159, 0.37274953, 0.86405196]]) Coordinates: * level_1 (level_1) object 'a' 'b' * y (y) int64 0 1 2 3 ```

Some notes about the implementation: - I slightly modified Coordinate so that it allows setting different values for the names of the coordinate and its dimension. There is no breaking change. - I also added a Coordinate.get_level_coords method to get independent, single-index coordinates objects from a MultiIndex coordinate.

Remaining issues: - Coordinate.get_level_coords calls pandas.MultiIndex.get_level_values for each level and is itself called each time when indexing and for repr. This can be very costly!! It would be nice to return some kind of lazy index object instead of computing the actual level values. - repr replace a MultiIndex coordinate by its level coordinates. That can be confusing in some cases (see below). Maybe we can set a different marker than * for level coordinates.

``` In [6]: [name for name in da.coords] Out[6]: ['x', 'y']

In [7]: da.coords.keys() Out[7]: KeysView(Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 0 1 0 1 * y (y) int64 0 1 2 3) `` -DataArray.level_1doesn't return anotherDataArray` object:

In [10]: da.level_1 Out[10]: <xarray.Coordinate 'level_1' (x: 4)> array(['a', 'a', 'b', 'b'], dtype=object) - Maybe we need to test the uniqueness of level names at DataArray or Dataset creation.

Of course still needs proper tests and docs...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/947/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
159768214 MDExOlB1bGxSZXF1ZXN0NzM0NjU0MTA= 879 Multi-index repr benbovy 4160723 closed 0     2 2016-06-11T10:58:13Z 2016-08-31T21:40:59Z 2016-08-31T21:40:59Z MEMBER   0 pydata/xarray/pulls/879

Another item of #719.

An example:

``` python

index = pd.MultiIndex.from_product((list('ab'), range(10))) index.names= ('a_long_level_name', 'level_1') data = xr.DataArray(range(20), [('x', index)]) data <xarray.DataArray (x: 20)> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) Coordinates: * x (x) object MultiIndex - a_long_level_name object 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'b' ... - level_1 int64 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ```

To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the pandas.MultiIndex.get_level_values method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first x values, which is very cheap.

It still needs testing.

Maybe it would be nice to align the level values.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/879/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
169368546 MDU6SXNzdWUxNjkzNjg1NDY= 942 Filtering by data variable name benbovy 4160723 closed 0     3 2016-08-04T13:01:20Z 2016-08-04T19:09:07Z 2016-08-04T19:09:07Z MEMBER      

Given #844 and #916, maybe it might be useful to also have a Dataset.filter_by_name method?

I currently deal with datasets that have many data variables with names like:

... reference__HONO (rlevel) float64 3.16e-15 1e-14 1e-14 1e-14 ... reference__NO (rlevel) float64 2.16e-05 3.57e-06 9.3e-07 ... reference__HO2NO2 (rlevel) float64 9.58e-20 7.32e-19 4.63e-18 ... ... retrieved__O3 (level) float64 1.552e-06 5.618e-07 ... retrieved__N2O (level) float64 4.714e-11 9.905e-11 ... retrieved__CO2 (level) float64 0.0002816 0.0003592 ... ...

Using ds.filter_by_name(like='reference__') would be less verbose than, e.g., xr.Dataset({name: ds[name] for name in ds.keys() if 'reference__' in name}), unless there is already a more convenient way that I'm missing?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/942/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
166299782 MDExOlB1bGxSZXF1ZXN0Nzc5NTM1MjI= 903 fixed multi-index copy test benbovy 4160723 closed 0     1 2016-07-19T10:37:36Z 2016-07-19T14:48:12Z 2016-07-19T14:47:58Z MEMBER   0 pydata/xarray/pulls/903
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/903/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
143264649 MDExOlB1bGxSZXF1ZXN0NjQwNDI5ODk= 802 Multi-index indexing benbovy 4160723 closed 0     22 2016-03-24T14:39:38Z 2016-07-19T10:48:56Z 2016-07-19T01:15:42Z MEMBER   0 pydata/xarray/pulls/802

Follows #767.

This is incomplete (it still needs some tests and documentation updates), but it is working for both Dataset and DataArray objects. I also don't know if it is fully compatible with lazy indexing (Dask).

Using the example from #767:

In [4]: da.sel(band_wavenumber={'band': 'foo'}) Out[4]: <xarray.DataArray (wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * wavenumber (wavenumber) float64 4.05e+03 4.05e+03

As shown in this example, similarily to pandas, it automatically renames the dimension and assigns a new coordinate when the selection doesn't return a pd.MultiIndex (here it returns a pd.FloatIndex).

In some cases this behavior may be unwanted (??), so I added a drop_level keyword argument (if False it keeps the multi-index and doesn't change the dimension/coordinate names):

In [5]: da.sel(band_wavenumber={'band': 'foo'}, drop_level=False) Out[5]: <xarray.DataArray (band_wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3)

Note that it also works with DataArray.loc, but (for now) in that case it always returns the multi-index:

In [6]: da.loc[{'band_wavenumber': {'band': 'foo'}}] Out[6]: <xarray.DataArray (band_wavenumber: 2)> array([ 0.00017, 0.00014]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ('foo', 4050.3)

This is however inconsistent with Dataset.sel and Dataset.loc that both apply drop_level=True by default, due to their different implementation. Two solutions: (1) make DataArray.loc apply drop_level by default, or (2) use drop_level=False by default everywhere.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/802/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
159920667 MDExOlB1bGxSZXF1ZXN0NzM1NTQ2MTI= 881 Fix variable copy with multi-index benbovy 4160723 closed 0     1 2016-06-13T10:38:46Z 2016-06-16T21:01:11Z 2016-06-16T21:01:07Z MEMBER   0 pydata/xarray/pulls/881

Fixes #769.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/881/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 27.592ms · About: xarray-datasette