home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

523 rows where user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Explicit indexes 24
  • html repr of xarray object (for the notebook) 15
  • Explicit indexes in xarray's data-model (Future of MultiIndex) 12
  • WIP: html repr 12
  • Flexible indexes refactoring notes 12
  • Expose "Coordinates" as part of Xarray's public API 12
  • Multi-index indexing 11
  • Pass indexes directly to the DataArray and Dataset constructors 11
  • Html repr 9
  • Flexible indexes: review the implementation of alignment and merge 9
  • Multi-index levels as coordinates 8
  • MultiIndex and data selection 6
  • Detailed report for testing.assert_equal and testing.assert_identical 6
  • Extend xarray with custom "coordinate wrappers" 6
  • Flexible indexes: add Index base class and xindexes properties 6
  • 'NaT' as fill value and netcdf export 6
  • [community] Flexible indexes meeting 6
  • WIP: Optional indexes (no more default coordinates given by range(n)) 5
  • Add `set_index`, `reset_index` and `reorder_levels` methods 5
  • Idea: functionally-derived non-dimensional coordinates 5
  • Add set_xindex and drop_indexes methods 5
  • MultiIndex serialization to NetCDF 4
  • Dataset groups 4
  • Document the new __repr__ 4
  • xarray contrib module 4
  • slice using non-index coordinates 4
  • MultiIndex listed multiple times in Dataset.indexes property 4
  • groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet 4
  • Periodic Boundary Index 4
  • Merge wrongfully creating NaN 4
  • …

user 1

  • benbovy · 523 ✖

author_association 1

  • MEMBER 523
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1259228475 https://github.com/pydata/xarray/issues/6293#issuecomment-1259228475 https://api.github.com/repos/pydata/xarray/issues/6293 IC_kwDOAMm_X85LDk07 benbovy 4160723 2022-09-27T09:22:04Z 2023-08-24T11:42:53Z MEMBER

Following thoughts and discussions in various issues (e.g., #6836), I'd like to suggest another section to the ones in the top comment:

Deprecate pandas.MultiIndex special cases in Xarray

  • remove the multi-index “dimension” coordinate (tuple elements)
  • do not automatically promote pandas.MultiIndex objects as dimension + level coordinates, e.g., like in xr.Dataset(coords={“x”: pd_midx}) but instead treat it as a single duck-array.
  • do not accept pandas.MultiIndex as dim argument in xarray.concat() (#7148)
  • remove obj.to_index() for all xarray objects?
  • (EDIT) remove Dataset.reset_index() and DataArray.reset_index()

They are source of many problems and complexities in Xarray internals (many regressions reported since the index refactor were related to those special cases) and I'm not sure that the value they add is really worth the trouble. Also, in the long term the special treatment of PandasMultiIndex vs. other Xarray multi-indexes may add some confusion.

Some of those features are widely used (e.g., the creation of Dataset / DataArray from pandas multi-indexes is used in many places in unit tests), so we would need convenient alternatives and a smooth transition.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes: next steps 1148021907
1504975778 https://github.com/pydata/xarray/issues/6836#issuecomment-1504975778 https://api.github.com/repos/pydata/xarray/issues/6836 IC_kwDOAMm_X85ZtBui benbovy 4160723 2023-04-12T09:42:39Z 2023-04-12T09:42:39Z MEMBER

A special-case sounds reasonable to me as well as a temporary fix before looking into if/how we can refactor groupby so that it works with multiple kinds of built-in and/or custom indexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet 1318992926
1480906129 https://github.com/pydata/xarray/pull/7653#issuecomment-1480906129 https://api.github.com/repos/pydata/xarray/issues/7653 IC_kwDOAMm_X85YRNWR benbovy 4160723 2023-03-23T10:01:35Z 2023-03-23T10:01:35Z MEMBER

For the html repr an option that is easy to implement would be to add max-height and overflow-y: scroll CSS properties here: https://github.com/pydata/xarray/blob/1e361ccb9123fe25acfd9e3364c911c1eec7d9db/xarray/static/css/style.css#L256-L261

I don't think the default browser scrollbar will look very pretty inside the repr, but it might be OK if we don't set max-height to a too small value.

A "click to expand" UI would certainly look prettier, but I doubt it would be easy to implement that in pure-CSS. "Expand on hover" is easier but that would be quite annoying UX I think.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  limit lines in html repr of dataset attrs 1633513067
1463633814 https://github.com/pydata/xarray/issues/7563#issuecomment-1463633814 https://api.github.com/repos/pydata/xarray/issues/7563 IC_kwDOAMm_X85XPUeW benbovy 4160723 2023-03-10T10:59:07Z 2023-03-10T10:59:07Z MEMBER

Thanks for the report @lkugler !

Directly assigning a multi-index like mda['position'] = midx is now ambiguous because all levels of the multi-index are now exposed as actual coordinates. We should provide a temporary fix or at least issue a warning. A proper way to assign a pandas multi-index is implemented in #7368. In the meantime, the workaround below should work for your example (it might stop working in the future, though):

python mda.coords.update(xr.Dataset(coords={"position": midx}))

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex coordinates do not exist updating v2022.3 to v2022.12 1600983717
1440178393 https://github.com/pydata/xarray/pull/7530#issuecomment-1440178393 https://api.github.com/repos/pydata/xarray/issues/7530 IC_kwDOAMm_X85V12DZ benbovy 4160723 2023-02-22T14:51:32Z 2023-02-22T14:51:32Z MEMBER

I've imported the generated PDF in inkscape, fixed the font and converted it to paths, added a small margin and exported it as svg. I attach the file here, @dcherian feel free to add it in this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [skip-ci] Add PDF of Xarray logo 1584791395
1438377578 https://github.com/pydata/xarray/issues/7539#issuecomment-1438377578 https://api.github.com/repos/pydata/xarray/issues/7539 IC_kwDOAMm_X85Vu-Zq benbovy 4160723 2023-02-21T12:13:18Z 2023-02-21T12:13:18Z MEMBER

In general I also find that xr.concat is a powerful feature (incl. auto-alignment and merge options) at the expense that it may sometimes (often?) be hard to reason about. Would it make sense to have a simpler version? To avoid making xr.concat signature even more complicated, maybe another top-level function like xr.concat_noalign? Or any suggestion in #7045 to deactivate auto-alignment Xarray-wise. Or indeed at least make it clearer in the docs that something like drop_indexes or reset_coords should be used first in order to skip auto-alignment for some variables.

I don't really know what I would prefer to happen with the coordinates. I guess to have created a time coordinate of size {new: 2, time: 4, cols: 2}, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts?

I guess easiest for a concat version with no auto-alignment would be to drop the index when such case happens. (note: one problem in your example is that the Xarray data model still does not allow having a multi-dimensional "time" variable with "time" as also one of its dimensions, but this could be now relaxed).

I've been also wondering whether some kind of NDPandasIndex would make any sense, i.e., a n-d coordinate variable with an internal 1-d (flattened) pandas index and some logic to convert between those n-d vs. 1-d spaces. This is the kind of approach used in xoak for using a kd-tree with coordinates of arbitrary dimensions, where labels in the form of nd-arrays for each coordinate are mapped into the [n_points, n_coords] shape (and inversely for getting the integer indices back as nd-arrays). This works well for point-wise indexing, but I doubt it would be very useful beyond that (e.g., slicing, etc.).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concat doesn't concatenate dimension coordinates along new dims 1588461863
1431496828 https://github.com/pydata/xarray/issues/7076#issuecomment-1431496828 https://api.github.com/repos/pydata/xarray/issues/7076 IC_kwDOAMm_X85VUuh8 benbovy 4160723 2023-02-15T14:54:27Z 2023-02-15T14:54:27Z MEMBER

@ACHMartin the issue is when you do newds['z'] = stacked.z. In the last versions of Xarray multi-index levels have each their own (real) coordinates, for consistency and clarity we soon won't support assigning a multi-index to a single coordinate of a Dataset / DataArray like that.

I think that in other places we still do support it with a deprecation notice, but apparently in your example this is not the case. unstack doesn't work because the multi-index(es) and the coordinates of newds are not consistent.

I don't know exactly what is your real problem, but from now on you should avoid implicitly assign a multi-index with xr_obj["my_coord"] = ... or xr_obj.assign(my_coord=...). Instead you should re-create the multi-index, e.g., in your minimal example newds = newds.set_index(z=["across", "along"]).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Can't unstack concatenated DataArrays 1384465119
1427538729 https://github.com/pydata/xarray/issues/7463#issuecomment-1427538729 https://api.github.com/repos/pydata/xarray/issues/7463 IC_kwDOAMm_X85VFoMp benbovy 4160723 2023-02-13T08:31:49Z 2023-02-13T09:26:10Z MEMBER

There are two issues:

  • whether we should continue allowing IndexVariable data be updated in place via .data property. IMO we should really deprecate it, especially that now it is possible to have custom, possibly expensive index structures built from one or more coordinates.

  • whether deep=True should deep copy the Xarray index objects. I don't have strong opinion on this. There is a similar discussion on the pandas side: https://github.com/pandas-dev/pandas/issues/19862. I wonder if we reverted the change here because some high-level operations in Xarray were by default deep copying the indexes? I don't think we would want such behavior unless the user explicitly sets deep=True somewhere?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinates not deep copy 1550792876
1426311006 https://github.com/pydata/xarray/issues/7463#issuecomment-1426311006 https://api.github.com/repos/pydata/xarray/issues/7463 IC_kwDOAMm_X85VA8de benbovy 4160723 2023-02-10T20:31:10Z 2023-02-10T20:38:48Z MEMBER

Yes I think we should, but I might have missed the rationale behind allowing it if this is intentional.

EDIT: perhaps better to issue a warning first to avoid some breaking change. We could also try to fix it (make a deep copy) at the same time as deprecating it, but that might be tricky without again introducing performance regressions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinates not deep copy 1550792876
1426299770 https://github.com/pydata/xarray/issues/7463#issuecomment-1426299770 https://api.github.com/repos/pydata/xarray/issues/7463 IC_kwDOAMm_X85VA5t6 benbovy 4160723 2023-02-10T20:25:12Z 2023-02-10T20:25:12Z MEMBER

I think that the reverting change in IndexVariable came after refactoring copy in Xarray introduced some performance regression (https://github.com/pydata/xarray/pull/7209#issuecomment-1305593478).

I didn't see #1463 (https://github.com/pydata/xarray/issues/1463#issuecomment-340454702), though. It feels weird to me that we can mutate an IndexVariable via its data property, considering that the underlying index is immutable. IIUC xarr2.x.data[0] = 45 replaces the full index with a new one? I'm not sure if it is a good idea to allow this. For a pandas index that's probably OK (it is reasonably cheap to rebuild a new index) but for a custom index that is expensive to build (e.g., kd-tree) I don't think this behavior is desirable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Coordinates not deep copy 1550792876
1422518769 https://github.com/pydata/xarray/issues/2028#issuecomment-1422518769 https://api.github.com/repos/pydata/xarray/issues/2028 IC_kwDOAMm_X85Uyenx benbovy 4160723 2023-02-08T12:29:27Z 2023-02-08T12:41:00Z MEMBER

@gewitterblitz there is a kdtree-based index example in #7041 that works with multi-dimensional coordinates. You could also have a look at https://xoak.readthedocs.io/en/latest/ (it doesn't use Xarray indexes - soon hopefully - so the current API is via Xarray accessors).

EDIT: seeing your previous https://github.com/pydata/xarray/issues/2028#issuecomment-921926536, not sure how you could use slices for label selection using those indexes as I don't think the wrapped scipy / sklearn kdtree objects support range queries. Other spatial indexes may support it (e.g., there's an example in https://github.com/martinfleis/xvec of selecting points using a shapely.box, although currently it only supports 1-d geometry coordinates).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slice using non-index coordinates 309691307
1421222703 https://github.com/pydata/xarray/issues/2028#issuecomment-1421222703 https://api.github.com/repos/pydata/xarray/issues/2028 IC_kwDOAMm_X85UtiMv benbovy 4160723 2023-02-07T18:01:39Z 2023-02-07T18:01:39Z MEMBER

@aberges-grd If your non-index coordinate supports it (I guess it does?), you could assign a default index to the coordinate with set_xindex and then use slices for selection like any other (dimension) coordinate backed by a pandas index.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slice using non-index coordinates 309691307
1384164579 https://github.com/pydata/xarray/issues/7405#issuecomment-1384164579 https://api.github.com/repos/pydata/xarray/issues/7405 IC_kwDOAMm_X85SgKzj benbovy 4160723 2023-01-16T14:42:23Z 2023-01-16T14:42:23Z MEMBER

Yes thanks for the report. Looks like Dataset._coord_names got out of sync somehow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test for variable name in coords True after xr.merge with compat="minimal" 1512708767
1382070832 https://github.com/pydata/xarray/pull/7368#issuecomment-1382070832 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85SYLow benbovy 4160723 2023-01-13T16:13:16Z 2023-01-13T16:13:16Z MEMBER

Thanks for the review @shoyer. I addressed your comments.

Everything seems OK except a rather annoying mypy error that I'm struggling with:

The DataAlignable type variable should now encompass both DataWithCoords and Coordinates, since in this PR we add alignment support for the latter. I somewhat naively tried the options below without success:

  • DataAlignable = TypeVar("DataAlignable", bound=DataWithCoords | Coordinates) -> doesn't work since we cannot mix DataWithCoords and Coordinates when aligning each object (input type = output type)
  • DataAlignable = TypeVar("DataAlignable", bound=DataWithCoords, Coordinates) -> doesn't work with subclasses
  • DataAlignable = TypeVar("DataAlignable", Dataset, DataArray, Coordinates) -> doesn't work with generic types T_Dataset, etc.?
  • I even tried using a Protocol

@headtr1ck @Illviljan any idea?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1372908509 https://github.com/pydata/xarray/pull/7418#issuecomment-1372908509 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85R1Ovd benbovy 4160723 2023-01-05T23:08:15Z 2023-01-05T23:08:15Z MEMBER

Again, there is likely more good reasons merging the Datatree code with Xarray than not doing it, but IMHO such decision should be made very carefully. You certainly do know better than me what positive vs. negative impacts it would have here! I'm just speaking generally from my experience of having struggled while doing some heavy refactoring in Xarray recently :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1372888139 https://github.com/pydata/xarray/pull/7418#issuecomment-1372888139 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85R1JxL benbovy 4160723 2023-01-05T22:46:05Z 2023-01-05T22:46:05Z MEMBER

I don't have strong opinions for or against including datatree in Xarray. It indeed makes sense if it is using many Xarray internals and if there are many existing or potential applications for it. Additional load (CI) is fine if datatree doesn't bring any extra dependency and won't do so in the near future (which seems to be the case).

Datatree should become a first-class Xarray object

Since Datatree sits above DataArray and Dataset, it should not interfere with any of our existing API.

Would it mean that if someone wants to later add any feature "x" or "y" into Xarray, they just need implementing the feature for Dataset (and possibly DataArray) and it will be guaranteed to work with Datatree? (I guess so but I'm not familiar enough with Datatree to know it for sure).

Otherwise, if there is any extra implementation effort required to make feature "x" or "y" work with Datatree, then I'm concerned about the additional burden or obstacle for future contributors and maintainers. Or we could say that this is OK to leave datatree support and wait for someone to take care of it later, but I don't think it is ideal to have such non-synchronized state within Xarray itself.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1359003371 https://github.com/pydata/xarray/pull/7368#issuecomment-1359003371 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85RAL7r benbovy 4160723 2022-12-20T08:34:06Z 2022-12-20T08:34:06Z MEMBER

I'm wondering if instead of Coordinates.from_pandas_multiindex() we might want to provide a more generic constructor available as an extension point? For example:

Coordinates.from_index(index_obj: Any, *, factory=None, **kwargs=None)

factory could be guessed from the type of index_obj. Xarray would support by default the pandas.MultiIndex and pandas.Index types. Like for IO backends, we could provide a CoordinatesFactoryEntrypoint so that it could support other index types.

One downside is that specific (mandatory?) options like dim for a pandas (multi-)index are not directly visible.

Would it be useful or is it overkill?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1357719218 https://github.com/pydata/xarray/pull/7382#issuecomment-1357719218 https://api.github.com/repos/pydata/xarray/issues/7382 IC_kwDOAMm_X85Q7Say benbovy 4160723 2022-12-19T14:03:56Z 2022-12-19T14:03:56Z MEMBER

I don't know if the optimizations added here will benefit a large set of use cases (it took 6 months before seeing an issue report), but it is worth for at least a few of them. This is ready I think (added some benchmarks).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Some alignment optimizations 1498386428
1353034657 https://github.com/pydata/xarray/pull/7382#issuecomment-1353034657 https://api.github.com/repos/pydata/xarray/issues/7382 IC_kwDOAMm_X85Qpauh benbovy 4160723 2022-12-15T13:05:55Z 2022-12-15T13:05:55Z MEMBER

Quick benchmark taking the example in #7376 (it seems even much faster than in version 2022.3.0!)

```python

version 2022.3.0

%timeit ds.assign(foo=~ds["d3"])

22.5 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

main branch

%timeit ds.assign(foo=~ds["d3"])

193 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

this PR

%timeit ds.assign(foo=~ds["d3"])

1.01 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Some alignment optimizations 1498386428
1352989233 https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QpPox benbovy 4160723 2022-12-15T12:27:37Z 2022-12-15T12:27:37Z MEMBER

Thanks @benbovy! Are you also aware of the issue with plain assign being slower on MultiIndex (comment above: https://github.com/pydata/xarray/issues/7376#issuecomment-1350446546)? Do you know what could be the issue there by any chance?

I see that in ds.assign(foo=~ds["d3"]), the coordinates of ~ds["d3"] are dropped (#2087), which triggers re-indexing of the multi-index when aligning ds with ~ds["d3"]. This is a quite expensive operation.

It is not clear to me what would be a clean fix (see, e.g., #2180), but we could probably optimize the alignment logic so that when all unindexed dimension sizes match with indexed dimension sizes (like your example) no re-indexing is performed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1352874809 https://github.com/pydata/xarray/pull/7368#issuecomment-1352874809 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85Qozs5 benbovy 4160723 2022-12-15T10:42:59Z 2022-12-15T10:42:59Z MEMBER

OK this is now ready for review (cc @shoyer).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1352818155 https://github.com/pydata/xarray/pull/7368#issuecomment-1352818155 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85Qol3r benbovy 4160723 2022-12-15T09:59:03Z 2022-12-15T09:59:03Z MEMBER

Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting Coordinates is pretty clean and future proof IMO (assuming that we'll further refactor Coordinates to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now?

Fixed in 193dad3 (with some reasonable special case added in merge_core).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1352310432 https://github.com/pydata/xarray/pull/7368#issuecomment-1352310432 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85Qmp6g benbovy 4160723 2022-12-14T22:33:23Z 2022-12-15T01:08:41Z MEMBER

I did some profiling to find the cause of the decrease in performance reported in the benchmarks (dataset creation). In summary, this is explained by a Coordinates object (built from the coords mapping) that is now included in objects to align when merging data vars and coordinates. Previously all non DataArray objects in the coords mapping were excluded from alignment (in deep_align). The introduced overhead comes from a call to Coordinates._reindex_callback(), which (I think?) should do no more than shallow copies and/or xarray wrapping stuff. In the benchmark report this is only marked as significant when creating small datasets (1.5-2x slower), and it becomes insignificant for datasets with more data variables.

Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting Coordinates is pretty clean and future proof IMO (assuming that we'll further refactor Coordinates to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now?

More details about the new workflow implemented in this PR when creating a new Dataset:

  • if Dataset's coords argument is a "simple" mapping, it is first internally converted into a Coordinates object, with the creation of default indexes for dimension coordinates
  • if one or more DataArray objects are given in coords, their coordinates (variables + indexes) are extracted and merged with the other input coordinates
  • see the implementation in xarray.core.coordinates.create_coords_with_default_indexes
  • otherwise, just reuse the Coordinates object passed as coords
  • coordinates are then merged with data variables
  • the Coordinates object is aligned with every other "alignable" object found in data_vars
  • coordinate indexes (if any) are passed explicitly to align so they are used in priority
  • explicitly using a Coordinates object skips the creation of default indexes during merging (in collect_variables_and_indexes())
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1352318926 https://github.com/pydata/xarray/issues/7376#issuecomment-1352318926 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85Qmr_O benbovy 4160723 2022-12-14T22:43:11Z 2022-12-14T22:47:37Z MEMBER

Are you aware of any workarounds for this issue with the current code (assuming I would like to preserve MultiIndex).

Unfortunately I don't know about any workaround that would preserve the MultiIndex. Depending on how you use the multi-index, you could instead set two single indexes for "i1" and "i2" respectively (it is supported now, use set_xindex()). I think that groupby will work well in that case. If you really need a multi-index, you could still build it afterwards from the groupby result.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1350738301 https://github.com/pydata/xarray/issues/7376#issuecomment-1350738301 https://api.github.com/repos/pydata/xarray/issues/7376 IC_kwDOAMm_X85QgqF9 benbovy 4160723 2022-12-14T09:40:57Z 2022-12-14T09:40:57Z MEMBER

Thanks for the report @ravwojdyla.

Since #5692, multi-indexes level have each their own coordinate variable so copying takes a bit more time as we need to create more variables. Not sure what's happening with _maybe_cast_to_cftimeindex, though.

The real issue here, however, is the same than in #6836. In your example, .groupby("i1") creates 400 000 groups whereas it should create only 4 groups.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  groupby+map performance regression on MultiIndex dataset 1495605827
1349321538 https://github.com/pydata/xarray/pull/7368#issuecomment-1349321538 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QbQNC benbovy 4160723 2022-12-13T18:03:17Z 2022-12-13T18:03:17Z MEMBER

I think this is ready for review!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1347327518 https://github.com/pydata/xarray/pull/7368#issuecomment-1347327518 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QTpYe benbovy 4160723 2022-12-12T21:05:56Z 2022-12-12T21:05:56Z MEMBER

In order to skip creating default indexes when passing a Coordinates object, I first tried a small refactor but in the end I found that the cleanest way to do it was to support alignment for Coordinates. I think it makes sense now that Coordinates is part of Xarray's public API as a "stand-alone" container like Dataset and DataArray.

The "no default index with Coordinates" behavior should be consistent Xarray-wise, i.e., for DataArray / Dataset constructors and also assign_coords, update, etc.

Sorry this PR is getting big, but hopefully this is almost ready (still a few tests to fix or to add).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1346344694 https://github.com/pydata/xarray/pull/7368#issuecomment-1346344694 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QP5b2 benbovy 4160723 2022-12-12T11:55:10Z 2022-12-12T11:55:10Z MEMBER

My suggestion would be: coords passed as a dict: create default indexes coords passed as IndexedCoordinates: do not create defaults

So if we already have some coordinate data as a dict but don't want any default index, we would need to do this:

python ds = xr.Dataset(coords=xr.Coordinates(my_coord_dict))

instead of this:

python ds = xr.Dataset(coords=my_coord_dict)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1346091151 https://github.com/pydata/xarray/pull/7368#issuecomment-1346091151 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QO7iP benbovy 4160723 2022-12-12T08:36:09Z 2022-12-12T08:36:09Z MEMBER

Thanks @shoyer, I've been thinking about similar short/long term plans although so far I haven't figured out how to implement your point 3. I'll give it another try.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1345314909 https://github.com/pydata/xarray/pull/7368#issuecomment-1345314909 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QL-Bd benbovy 4160723 2022-12-10T16:59:44Z 2022-12-10T16:59:44Z MEMBER

Long term, do you think it would make sense to merge together Indexes, Coordinates and IndexedCoordinates? They are sort of all containers for the same thing.

Yes I think so.

I'm actually trying to merge IndexedCoordinates with Coordinates but I'm stuck: the latter is abstract and I don't really see how I could refactor it together with DatasetCoordinates and DataArrayCoordinates. Do you have any idea on how best to proceed?

Ideally, I'd see Coordinates be exposed in Xarray's main namespace with at least the two following constructors:

```python class Coordinates:

def __init__(
    self,
    coords: Mapping[Any, Any] | None = None,
    indexes: Mapping[Any, Index] | None = None,
):
    # Similar to Dataset.__init__ but without the need
    # to merge coords and data vars...
    # Probably ok to allow more flexibility / less safety here?
    ...

@classmethod
from_pandas_multiindex(cls, index: pd.MultiIndex, dim: str):
    ...

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1344046801 https://github.com/pydata/xarray/pull/7368#issuecomment-1344046801 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QHIbR benbovy 4160723 2022-12-09T09:13:24Z 2022-12-09T09:16:35Z MEMBER

I added IndexedCoordinates.merge_coords so that it is easier to combine different coordinates to pass to a new Dataset / DataArray, e.g.,

```python midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two"))

coords = xr.IndexedCoordinates.from_pandas_multiindex(midx, "x")

coords = coords.merge_coords({"y": [0, 1, 2]})

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

* y (y) int64 0 1 2

ds = xr.Dataset(coords=coords)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

* y (y) int64 0 1 2

Data variables:

empty

```

IndexedCoordinates.merge_coords is very much like Coordinates.merge except that it returns a new Coordinates object instead of a Dataset.

Or should we just use merge? It would require that:

  • Coordinates.merge accepts Mapping[Any, Any] for its other argument. Only changing the type hint is enough here since the implementation already accepts any input passed to Dataset.
  • When a Dataset is passed as coords argument to a new Dataset and DataArray, both variables and indexes should be extracted. It is already the case for Dataset but I think it only works for PandasIndex and PandasMultiIndex (default indexes & backwards compatibility).
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1344004727 https://github.com/pydata/xarray/pull/7368#issuecomment-1344004727 https://api.github.com/repos/pydata/xarray/issues/7368 IC_kwDOAMm_X85QG-J3 benbovy 4160723 2022-12-09T08:32:28Z 2022-12-09T09:14:17Z MEMBER

IndexedCoordinates and Indexes have a lot of overlap. At some point we might consider merging the two classes, like @shoyer suggests in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938. The main difference is that one is a mapping of coordinates and the other is a mapping of indexes. IndexedCoordinates is mostly reusing Indexes and Dataset under the hood, it is only a facade.

Alternatively to an IndexedCoordinates subclass I was wondering if we could reuse the Coordinates base class? There's some benefit of providing a subclass:

  • besides specific constructors like .from_pandas_multiindex() it has a generic __init__ for advanced use cases. Not sure it is a good idea to add this constructor to the base class?
  • unlike Coordinates, IndexedCoordinates is immutable.

What if the Indexes class was a facade based on IndexedCoordinates instead of the other way around? It would probably make more sense but it would also be a bigger refactor. I've chosen the easy way :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose "Coordinates" as part of Xarray's public API 1485037066
1335509983 https://github.com/pydata/xarray/pull/7347#issuecomment-1335509983 https://api.github.com/repos/pydata/xarray/issues/7347 IC_kwDOAMm_X85PmkPf benbovy 4160723 2022-12-02T16:33:59Z 2022-12-02T16:33:59Z MEMBER

Great! (I was worried that it would mess up #7345).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix assign_coords resetting all dimension coords to default index 1472483025
1334986216 https://github.com/pydata/xarray/pull/7347#issuecomment-1334986216 https://api.github.com/repos/pydata/xarray/issues/7347 IC_kwDOAMm_X85PkkXo benbovy 4160723 2022-12-02T09:35:42Z 2022-12-02T09:35:42Z MEMBER

@dcherian we can merge this after #7345 to make things easier for the release?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix assign_coords resetting all dimension coords to default index 1472483025
1326262197 https://github.com/pydata/xarray/issues/7045#issuecomment-1326262197 https://api.github.com/repos/pydata/xarray/issues/7045 IC_kwDOAMm_X85PDSe1 benbovy 4160723 2022-11-24T10:35:02Z 2022-11-24T10:35:02Z MEMBER

I find the analogy with relational databases quite meaningful!

Rectangular grids likely have been the primary use case in Xarray for a long time, but I wonder to which extent it is the case nowadays. Probably a good question to ask for the next user survey?

Interestingly, the 2021 user survey results (*) show that "interoperability with pandas" is not a critical feature while "label-based indexing, interpolation, groupby, reindexing, etc." is most important, although the description of the latter is rather broad. It would be interesting to compute the correlation between these two variables. The results also show that "more flexible indexing (selection, alignment)" is very useful or critical for 2/3 of the participants.

Not sure how to interpret those results within the context of this discussion, though.

(*) The 2022 user survey results doesn't show significant differences in general

suppose one could in principle have an array with coordinates such that none of the coordinates aligned with any particular axis, but it seems improbable.

Not that improbable for unstructured meshes, curvilinear grids, staggered grids, etc. Xarray is often chosen to handle them too (e.g., uxarray, xgcm).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should Xarray stop doing automatic index-based alignment? 1376109308
1324753837 https://github.com/pydata/xarray/issues/7297#issuecomment-1324753837 https://api.github.com/repos/pydata/xarray/issues/7297 IC_kwDOAMm_X85O9iOt benbovy 4160723 2022-11-23T09:17:33Z 2022-11-23T09:17:33Z MEMBER

But does this still work properly with broadcasting? For example, let's say there is another data variable b (midx) and an operation is done like ds_stacked['c'] = ds_stacked.a + ds_stacked.b. Then it should be that c (midx) and a (x) should be "repeated" to midx.x

I think it would keep things much simpler if we consider "x" and "midx" as two separate dimensions in the stacked Dataset, i.e., ds_stacked['c'] would result in a 2-d array (x, midx). There's no such thing like a "midx.x" dimension in Xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stack().unstack() not the same as original for datavars dependent on single coordinate of multi_index 1454832041
1323849354 https://github.com/pydata/xarray/issues/7297#issuecomment-1323849354 https://api.github.com/repos/pydata/xarray/issues/7297 IC_kwDOAMm_X85O6FaK benbovy 4160723 2022-11-22T15:24:53Z 2022-11-22T15:36:46Z MEMBER

The last example in your comment is probably the most meaningful one:

```

<xarray.Dataset>

Dimensions: (x: 2, midx: 4)

Coordinates:

* midx (midx) object MultiIndex

* x (midx) int32 1 1 2 2

* y (midx) int32 3 4 3 4

Data variables:

a (x) int32 6 7

```

To avoid name conflicts, we could just discard the original dimension coordinates x and y. Like here above, "x" becomes a dimension without coordinate. In that example, when unstacking we would retrieve the "x" dimension coordinate like in the original dataset.

(note: I think it is now possible to have a dimension "x" and a coordinate "x" with different dimensions, but I haven't checked).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stack().unstack() not the same as original for datavars dependent on single coordinate of multi_index 1454832041
1323478134 https://github.com/pydata/xarray/issues/7297#issuecomment-1323478134 https://api.github.com/repos/pydata/xarray/issues/7297 IC_kwDOAMm_X85O4qx2 benbovy 4160723 2022-11-22T10:50:01Z 2022-11-22T10:50:01Z MEMBER

Interesting! I don't think that when adding stack / unstack we were thinking that variables with only a subset of the stacked dimensions would be a common use case.

I guess it would be possible to add some option to stack only the variables that have all the dimensions to be stacked, and leave the other variables unchanged? However, one problem with keeping the original dimension coordinates is that we would have name conflicts between the single index coordinates and the multi-index coordinates.

In your expected example, the "x" coordinate is part of the multi-index but it doesn't have the same dimension "midx"? I find it rather confusing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stack().unstack() not the same as original for datavars dependent on single coordinate of multi_index 1454832041
1316230358 https://github.com/pydata/xarray/issues/7278#issuecomment-1316230358 https://api.github.com/repos/pydata/xarray/issues/7278 IC_kwDOAMm_X85OdBTW benbovy 4160723 2022-11-16T02:57:48Z 2022-11-16T02:57:48Z MEMBER

👍

Use it at your own risk 😉

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remap_label_indexers removed without deprecation update? 1444752393
1313866757 https://github.com/pydata/xarray/issues/7250#issuecomment-1313866757 https://api.github.com/repos/pydata/xarray/issues/7250 IC_kwDOAMm_X85OUAQF benbovy 4160723 2022-11-14T14:45:39Z 2022-11-14T14:45:39Z MEMBER

That's a bug in this method: https://github.com/pydata/xarray/blob/6f9e33e94944f247a5c5c5962a865ff98a654b30/xarray/core/indexing.py#L1528-L1532

Xarray array wrappers for pandas indexes keep track of the original dtype and should restore it when converted into numpy arrays. Something like this should work for the same method:

python def __array__(self, dtype: DTypeLike = None) -> np.ndarray: if dtype is None: dtype = self.dtype if self.level is not None: return np.asarray( self.array.get_level_values(self.level).values, dtype=dtype ) else: return super().__array__(dtype)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  stack casts int32 dtype coordinate to int64 1433998942
1313748084 https://github.com/pydata/xarray/issues/6836#issuecomment-1313748084 https://api.github.com/repos/pydata/xarray/issues/6836 IC_kwDOAMm_X85OTjR0 benbovy 4160723 2022-11-14T13:55:02Z 2022-11-14T13:55:02Z MEMBER

we can fix that in safe_cast_to_index()

...we cannot fix that in safe_cast_to_index() (or we can add a parameter to specify the desired result).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet 1318992926
1313741685 https://github.com/pydata/xarray/issues/7282#issuecomment-1313741685 https://api.github.com/repos/pydata/xarray/issues/7282 IC_kwDOAMm_X85OTht1 benbovy 4160723 2022-11-14T13:51:21Z 2022-11-14T13:51:21Z MEMBER

Thanks @jjpr-mit and @mschrimpf for the report. See https://github.com/pydata/xarray/issues/6836#issuecomment-1313739883.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby and mean on a MultiIndex level raises ValueError 1445905299
1313739883 https://github.com/pydata/xarray/issues/6836#issuecomment-1313739883 https://api.github.com/repos/pydata/xarray/issues/6836 IC_kwDOAMm_X85OThRr benbovy 4160723 2022-11-14T13:49:47Z 2022-11-14T13:49:47Z MEMBER

From #7282 it looks like we need to convert the multi-index level to a single index when casting the group to an index. And from #7105 we can fix that in safe_cast_to_index() (sometimes the full multi-index is expected) so we probably need a special case in groupby.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby(multi-index level) not working correctly on a multi-indexed DataArray or DataSet 1318992926
1311942192 https://github.com/pydata/xarray/issues/7278#issuecomment-1311942192 https://api.github.com/repos/pydata/xarray/issues/7278 IC_kwDOAMm_X85OMqYw benbovy 4160723 2022-11-11T16:52:54Z 2022-11-11T16:52:54Z MEMBER

You may look at the logic implemented in the map_index_queries() function in xarray.core.indexing. This function is still not public API, but it calls .sel() for each index object, which should be more stable (although experimental).

Eventually we'll probably make merge_sel_results() public too. It might be useful for third-party indexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remap_label_indexers removed without deprecation update? 1444752393
1305780610 https://github.com/pydata/xarray/issues/6308#issuecomment-1305780610 https://api.github.com/repos/pydata/xarray/issues/6308 IC_kwDOAMm_X85N1KGC benbovy 4160723 2022-11-07T15:28:35Z 2022-11-07T15:28:35Z MEMBER

The kind of data wrapped in an Xarray Dataset (e.g., a Numpy array, a Dask array or any other array #5648) is already something useful that xr.doctor or xr.describe may tell!

From my experience of introducing Xarray to new users, they often completely ignore what is under the hood until something or someone makes them aware, likely after they experience some weird behavior or performance issue that is hard to figure out by themselves. Xarray objects are flexible container wrappers connected to a wide range of other Python libraries, such that it is hard to give a short introduction that covers all the important aspects (lazy / non-lazy, chunked / non-chunked, etc.). For example, it may be possible that someone who has never heard of Dask nor Zarr follows an Xarray tutorial that starts by opening a chunked dataset from a zarr store. In this case the rich repr of the Xarray Dataset doesn't even help.

Rather than a performance report or a profiling tool, the proposal here (still very elusive) is to provide a helper function that returns some information and explanation in plain english (why not with some hyperlinks, pretty printing, etc.) that would help users making sense of an Xarray object and its wrapped data/metadata. Some kind of interactive documentation very specific to the actual Xarray object. Some kind of smart tool that would partially "replace" custom (though very basic) user support.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  xr.doctor(): diagnostics on a Dataset / DataArray ? 1151751524
1305593478 https://github.com/pydata/xarray/pull/7209#issuecomment-1305593478 https://api.github.com/repos/pydata/xarray/issues/7209 IC_kwDOAMm_X85N0caG benbovy 4160723 2022-11-07T13:09:05Z 2022-11-07T13:09:05Z MEMBER

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

to_index_variable may be called in some core functions of Xarray internals (e.g., in as_variable()) so it might be tricky to benchmark its effect Xarray-wise. Perhaps it would be good to track it down in the original issue #7181?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optimize some copying 1421441672
1297046405 https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NT1uF benbovy 4160723 2022-10-31T12:54:50Z 2022-10-31T12:54:50Z MEMBER

Thanks for the suggestion @shoyer, in general I like it very much! "Coordinates possibly baked by one or more indexes" feels much more natural than "indexes and their corresponding coordinates". Even though indexes have been promoted as 1st class citizens in the data model, their right place should still be in the background compared to coordinates. So having a Coordinates object that encapsulates the indexes makes a lot of sense to me.

My main concern is about the timing, as such a broader refactor might postpone some work in progress on the public API and the documentation. Ideally this shouldn't discourage users to start experimenting with custom indexes and building an ecosystem around it, as soon as possible.

There might be a fast path towards your suggestion, at least regarding the public facing API (your points 1 and 4):

  • Keep "private" the constructor of Indexes and keep it immutable.
  • Add a new IndexedCoordinates(Coordinates) class. Unlike DatasetCoordinates and DataArrayCoordinates, it would have a public constructor and/or alternative class methods (e.g., .from_pandas_multi_index() suggested by @dcherian)
  • In general, passing any Coordinates object to coords would assign both the coordinates and the indexes.

This would let us the possibility to achieve a broader (mostly internal) refactor of Indexes and Coordinates objects later without the risk of introducing too much breaking changes.

Alternatively, we could just wait for that refactor to finish before implementing explicit assignment of coordinates and indexes. We already have .set_xindex() and .drop_indexes() that are relevant and we could wait before deprecating xr.Dataset(coords={"x": pandas_midx}). Not sure when such big refactor will happen, though, the wait could be long.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1294783661 https://github.com/pydata/xarray/pull/7214#issuecomment-1294783661 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NLNSt benbovy 4160723 2022-10-28T09:49:02Z 2022-10-28T09:49:02Z MEMBER

not necessarily do consistency checks (beyond verifying that the coordinate variables exist).

I'd just want to add that, from my experience with debugging multi-index issues, it is hard even for advanced users to see what's going wrong when coordinates and indexes are not consistent.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1294771427 https://github.com/pydata/xarray/pull/7214#issuecomment-1294771427 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NLKTj benbovy 4160723 2022-10-28T09:38:22Z 2022-10-28T09:38:22Z MEMBER

Maybe a more generic Indexes class method that could be reused by 3rd-party indexes too? E.g., via some kind of hook or entrypoint...

An Indexes accessor? Or this is going too far? 🙂

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1293946521 https://github.com/pydata/xarray/pull/7214#issuecomment-1293946521 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NIA6Z benbovy 4160723 2022-10-27T19:04:19Z 2022-10-27T19:52:21Z MEMBER

Explicitly providing indexes is an advanced user feature.

Agreed. However, xr.Dataset(coords={"x": pandas_midx}) is something that presumably a lot of users rely on (it is used extensively in Xarray's tests) and that we should really deprecate IMO. If we don't provide a convenient alternative, I expect many of those users will complain.

it's easier to explicitly manipulate indexes in the form of a dict

While generally I also prefer handling plain dict objects over custom dict-like objects, here I don't see much reasons of manipulating Xarray index objects independently of their coordinate variables. Indexes allows keeping them tied together, and it is already returned by .xindexes.

EDIT -- For more context: initially an Indexes object was almost equivalent to a Frozen(obj._indexes). In #5692 I tried hard and struggled to keep dealing with separate dicts of indexes and indexed variables, but in the end it made things much easier to encapsulate the variables in Indexes, which is also used internally in different places.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1293902008 https://github.com/pydata/xarray/pull/7214#issuecomment-1293902008 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NH2C4 benbovy 4160723 2022-10-27T18:21:02Z 2022-10-27T18:21:02Z MEMBER

How about Indexes.from_pandas_multi_index() classmethod?

Yes that would make sense. However, it would be adding another pandas.MultiIndex special case while we'd like to remove them in Xarray. Maybe a more generic Indexes class method that could be reused by 3rd-party indexes too? E.g., via some kind of hook or entrypoint... The tricky thing is that arguments would probably differ much from one index type to another.

  1. does indexes get merged with existing ._indexes?

Indexes are not merged together but the new / replaced coordinate variables must be compatible with the other variables of the dataset. Dataset.assign_indexes(indexes) is actually implemented like this:

python def assign_indexes(self, indexes: Indexes[Index]): ds_indexes = Dataset(indexes=indexes) return ( self # prepare drop-in index / coordinate replacement .drop_vars(indexes, errors="ignore") # ensure the new indexes / coordinates are compatible with the Dataset .merge( ds_indexes, compat="minimal", # probably not the right option? join="override", # fastest option? (no real effect because of `drop_vars`) combine_attrs="no_conflicts", ) )

  1. Can we extract enough information from Index to have xr.merge(Indexes) -> Indexes work?

That is actually a good idea for https://github.com/pydata/xarray/pull/7214#issuecomment-1292089179! Not sure I would reuse xr.merge() for this as it would make the API messy, but why not an xr.merge_indexes() top-level function or an Indexes.merge() method?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1293860075 https://github.com/pydata/xarray/pull/7221#issuecomment-1293860075 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85NHrzr benbovy 4160723 2022-10-27T17:40:52Z 2022-10-27T17:40:52Z MEMBER

Thanks @hmaarrfk!

I haven't fully understood why we had that code though?

Me neither. I don't remember ever seeing this assertion error raised while refactoring things. Any idea @shoyer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1293624950 https://github.com/pydata/xarray/pull/7222#issuecomment-1293624950 https://api.github.com/repos/pydata/xarray/issues/7222 IC_kwDOAMm_X85NGyZ2 benbovy 4160723 2022-10-27T14:37:10Z 2022-10-27T14:37:10Z MEMBER

Thanks @hmaarrfk!

I think the rapid return, helps by about 40% is still pretty good.

Yes definitely. I think we just forgot to add it.

However, I will argue that Aligner should really not be a class.

The reason of using a class is mainly for better code readability and also so that it is easier to refactor later. The alignment logic is really complex with lots of intermediate objects that are created and/or used at various stages. Probably using functions with some custom containers would have achieved the same goal, to be fair. This part of Xarray internals still deserves to be improved, but that would be a lot of work especially for such a critical piece of code in Xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Actually make the fast code path return early for Aligner.align 1423321834
1293531607 https://github.com/pydata/xarray/pull/7214#issuecomment-1293531607 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NGbnX benbovy 4160723 2022-10-27T13:31:24Z 2022-10-27T13:42:44Z MEMBER

I also added an .assign_indexes() method that may be quite convenient. Like for the constructors, it only accepts an Indexes instance.

```python ds = xr.Dataset(coords={"x": [4, 5, 6, 7]}) ds2 = xr.Dataset(coords={"x": [1, 2, 3, 4]})

ds.assign_indexes(ds2.xindexes)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) int64 1 2 3 4

Data variables:

empty

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two")) indexes = wrap_pandas_multiindex(midx, "x")

ds.assign_indexes(indexes)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

Data variables:

empty

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1293545325 https://github.com/pydata/xarray/pull/7214#issuecomment-1293545325 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NGe9t benbovy 4160723 2022-10-27T13:41:50Z 2022-10-27T13:41:50Z MEMBER

@pydata/xarray I'd be very happy if you could share your thoughts about the examples shown in the last three comments. If you think the API looks good like that, then I will work on adding some tests and on the documentation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1292089179 https://github.com/pydata/xarray/pull/7214#issuecomment-1292089179 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NA7db benbovy 4160723 2022-10-26T13:54:22Z 2022-10-26T13:54:22Z MEMBER

Passing multiple indexes:

```python midx1 = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two")) midx2 = pd.MultiIndex.from_product([["c", "d"], [3, 4]], names=("three", "four"))

indexes1 = wrap_pandas_multiindex(midx1, "x") indexes2 = wrap_pandas_multiindex(midx2, "y")

indexes = Indexes( indexes=dict(indexes1, indexes2), variables=dict(indexes1.variables, indexes2.variables) )

ds = xr.Dataset(indexes=indexes)

<xarray.Dataset>

Dimensions: (x: 4, y: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

* y (y) object MultiIndex

* three (y) object 'c' 'c' 'd' 'd'

* four (y) int64 3 4 3 4

Data variables:

empty

```

That's not looking super nice, but probably we can add some convenience function or Indexes method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1291911349 https://github.com/pydata/xarray/pull/7214#issuecomment-1291911349 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85NAQC1 benbovy 4160723 2022-10-26T11:47:57Z 2022-10-26T12:14:23Z MEMBER

I implemented option 3. We can still change or revert it later if it's not the best one.

A few examples:

```python import pandas as pd import xarray as xr from xarray.indexes import wrap_pandas_multiindex

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two")) ```

It is now possible to pass a pandas multi-index to a Dataset like this:

```python

this returns an Indexes object (indexes + coordinates)

indexes = wrap_pandas_multiindex(midx, "x")

ds = xr.Dataset(indexes=indexes)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

Data variables:

empty

```

IMO the above should be preferred over passing it as a coordinate (should we deprecate it now?):

```python ds_deprecated = xr.Dataset(coords={"x": midx})

ds_deprecated.identical(ds)

True

eventually this would behave like this:

ds_midx_as_array = xr.Dataset(coords={"x": midx})

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object ('a', 1) ('a', 2) ('b', 1) ('b', 2)

Data variables:

empty

```

We can pass indexes around from one Xarray object to another, e.g.,

```python da = xr.DataArray([1, 2, 3, 4], dims="x", indexes=ds.xindexes)

<xarray.DataArray (x: 4)>

array([1, 2, 3, 4])

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

```

Skip creating pandas indexes for dimension coordinates:

```python ds_noindex = xr.Dataset(coords={"x": [0, 1, 2]}, indexes={})

<xarray.Dataset>

Dimensions: (x: 3)

Coordinates:

x (x) int64 0 1 2

Data variables:

empty

ds_noindex.xindexes

Indexes:

empty

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1291638319 https://github.com/pydata/xarray/pull/7214#issuecomment-1291638319 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85M_NYv benbovy 4160723 2022-10-26T07:52:35Z 2022-10-26T07:52:35Z MEMBER

For passing multiple indexes at once we could probably expand the Indexes API, e.g., with an .update() method.

Maybe with something else than .update() (let's keep Indexes an immutable collection?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1291059643 https://github.com/pydata/xarray/pull/7214#issuecomment-1291059643 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85M9AG7 benbovy 4160723 2022-10-25T19:50:57Z 2022-10-25T19:50:57Z MEMBER

Hmm I'm wondering what would be best between the options below regarding the types for the indexes argument:

  1. Indexes[Index] | Sequence[Indexes[Index] | None
  2. Indexes[Index] | None
  3. Mapping[Any, Index] | None
  4. Any other suggestion?

Option 1 is nice for passing multiple indexes, e.g.,

```python pd_midx1 = pd.MultiIndex.from_arrays(..., names=("one", "two")) pd_midx2 = pd.MultiIndex.from_arrays(..., , names=("three", "four"))

indexes1 = PandasMultiIndex.from_pandas_index(pd_midx1, "x") indexes2 = PandasMultiIndex.from_pandas_index(pd_midx2, "y")

ds = xr.Dataset(indexes=[indexes1, indexes2]) ```

With option 1 it feels odd passing an empty list in order to avoid creating default indexes: ds = xr.Dataset(indexes=[]). Not really better in this regard with option 2: ds = xr.Dataset(indexes=Indexes()). Option 3 is better IMO: ds = xr.Dataset(indexes={}).

Option 3 actually works in all cases since Indexes[Index] is a sub-type of Mapping[Any, Index]. However, it is not clear from this generic type that any non-empty mapping must be an instance of Indexes (because the latter also contains the coordinate variables).

I'm leaning towards option 3. For passing multiple indexes at once we could probably expand the Indexes API, e.g., with an .update() method.

What do people think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1290454937 https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85M6seZ benbovy 4160723 2022-10-25T12:19:52Z 2022-10-25T12:19:52Z MEMBER

I'm thinking of only accepting one or more instances of Indexes as indexes argument in the Dataset and DataArray constructors. The only exception is when fastpath=True a mapping can be given directly.

  • It is much easier to handle: just check that keys returned by Indexes.variables do no conflict with the coordinate names in the coords argument
  • It is slightly safer: it requires the user to explicitly create an Indexes object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the Indexes class itself)
  • It is more convenient: an Xarray Index may provide a factory method that returns an instance of Indexes that we just need to pass as indexes
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1285038821 https://github.com/pydata/xarray/pull/7185#issuecomment-1285038821 https://api.github.com/repos/pydata/xarray/issues/7185 IC_kwDOAMm_X85MmCLl benbovy 4160723 2022-10-20T06:59:04Z 2022-10-20T06:59:04Z MEMBER

🚀

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  indexes section in the HTML repr 1413425793
1283994902 https://github.com/pydata/xarray/pull/7185#issuecomment-1283994902 https://api.github.com/repos/pydata/xarray/issues/7185 IC_kwDOAMm_X85MiDUW benbovy 4160723 2022-10-19T13:13:39Z 2022-10-19T13:13:39Z MEMBER

LGTM, that's awesome! It will be super handy for quick debugging and experimenting with custom indexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  indexes section in the HTML repr 1413425793
1283897249 https://github.com/pydata/xarray/pull/7183#issuecomment-1283897249 https://api.github.com/repos/pydata/xarray/issues/7183 IC_kwDOAMm_X85Mhreh benbovy 4160723 2022-10-19T11:59:08Z 2022-10-19T11:59:08Z MEMBER

Looks all good to me!

Do you want to add a what's new entry here or add it in #7185 with a link to this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  use `_repr_inline_` for indexes that define it 1412926287
1283103957 https://github.com/pydata/xarray/pull/7185#issuecomment-1283103957 https://api.github.com/repos/pydata/xarray/issues/7185 IC_kwDOAMm_X85MepzV benbovy 4160723 2022-10-18T22:57:16Z 2022-10-18T22:57:16Z MEMBER

Thanks @keewis for opening this PR.

I added some commits (hope you don't mind) to fix the CSS. I also grouped the items in the indexes section by unique index with index coordinates separated by line return, so it looks like the coordinate section while the multi-coordinate indexes are clearly visible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  indexes section in the HTML repr 1413425793
1283038653 https://github.com/pydata/xarray/pull/7182#issuecomment-1283038653 https://api.github.com/repos/pydata/xarray/issues/7182 IC_kwDOAMm_X85MeZ29 benbovy 4160723 2022-10-18T21:40:49Z 2022-10-18T21:40:49Z MEMBER

I wonder if it is possible to create a generic MultiIndex?

Hmm that could be possible but it think there are just too many possible edge cases for something generic like that.

In your specific example

python ds.set_xindex( ["a", "b"], MultiIndex([("a", PandasIndex), ("b", PandasIndex), (["a", "b"], BallTreeIndex)), )

we could probably use the BallTreeIndex for point-wise indexing (i.e., with ds.sel(a=xr.DataArray(...), b=xr.DataArray(...))) and use the two PandasIndex instances for other kinds of selection (e.g., with slices, scalars, etc.) so there's no conflict, but I doubt this would be what we want in other cases.

I guess your suggestion is a way around the constraint in the Xarray data model that a coordinate cannot have multiple indexes? I'm afraid there's no easy solution that is generic enough. Maybe some cache to avoid rebuilding the indexes? I.e., .set_xindex() doesn't drop the pre-existing index(es) but rather disable them so that it is possible to re-enable them later with another .set_xindex() call (.xindexes only returns the "active" indexes but there may be other "inactive" indexes attached to a dataset).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add MultiPandasIndex helper class 1412901282
1282295471 https://github.com/pydata/xarray/pull/7183#issuecomment-1282295471 https://api.github.com/repos/pydata/xarray/issues/7183 IC_kwDOAMm_X85Mbkav benbovy 4160723 2022-10-18T12:19:56Z 2022-10-18T12:19:56Z MEMBER

Yeah I think we could let the whole line after the 1st column (coordinate names) be customized by the index.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  use `_repr_inline_` for indexes that define it 1412926287
1282151989 https://github.com/pydata/xarray/pull/7183#issuecomment-1282151989 https://api.github.com/repos/pydata/xarray/issues/7183 IC_kwDOAMm_X85MbBY1 benbovy 4160723 2022-10-18T10:11:46Z 2022-10-18T10:11:46Z MEMBER

Great @keewis!

One question: should we let repr_inline display the class name or should we reserve a column for this and use repr_inline for other things? I.e., like variables have a dtype column and another column for values preview or other inline info.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  use `_repr_inline_` for indexes that define it 1412926287
1282016895 https://github.com/pydata/xarray/issues/7162#issuecomment-1282016895 https://api.github.com/repos/pydata/xarray/issues/7162 IC_kwDOAMm_X85MagZ_ benbovy 4160723 2022-10-18T08:35:29Z 2022-10-18T08:49:47Z MEMBER

Indexes.copy_indexes might also require some update that includes the memo argument. But not sure if that will solve the issue here.

That's a possible cause. Alignment may fail early because .xindexes returns different mappings of coordinates vs. index objects. It's worth checking if after copying the dataset, copy.xindexes returns the same CRSIndex object for its "x", "y" and "spatial_ref" coordinates.

EDIT: checking copy.xindexes.group_by_index() is more convenient.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  copy of custom index does not align with original 1409811164
1282024919 https://github.com/pydata/xarray/issues/7162#issuecomment-1282024919 https://api.github.com/repos/pydata/xarray/issues/7162 IC_kwDOAMm_X85MaiXX benbovy 4160723 2022-10-18T08:41:08Z 2022-10-18T08:41:08Z MEMBER

The refactored alignment logic could be improved (cf. #7002). The error raised in the method below is not very helpful.

https://github.com/pydata/xarray/blob/ab726c536464fbf4d8878041f950d2b0ae09b862/xarray/core/alignment.py#L294-L333

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  copy of custom index does not align with original 1409811164
1277301954 https://github.com/pydata/xarray/issues/6807#issuecomment-1277301954 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85MIhTC benbovy 4160723 2022-10-13T09:22:27Z 2022-10-13T09:22:27Z MEMBER

Not really a generic and parallel execution back-end, but Open-EO looks like an interesting use case too (it is a framework for managing remote execution of processing tasks on multiple big Earth observation cloud back-ends via a common API). I've suggested the idea of reusing the Xarray API here: https://github.com/Open-EO/openeo-python-client/issues/334.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1276685925 https://github.com/pydata/xarray/pull/7150#issuecomment-1276685925 https://api.github.com/repos/pydata/xarray/issues/7150 IC_kwDOAMm_X85MGK5l benbovy 4160723 2022-10-12T20:17:09Z 2022-10-12T20:17:09Z MEMBER

Thank you @lukasbindreiter! Merging.

I notice that this is your first contribution to Xarray, welcome!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update open_dataset backend to ensure compatibility with new explicit index model 1403144601
1276433539 https://github.com/pydata/xarray/pull/6795#issuecomment-1276433539 https://api.github.com/repos/pydata/xarray/issues/6795 IC_kwDOAMm_X85MFNSD benbovy 4160723 2022-10-12T16:19:34Z 2022-10-12T16:19:34Z MEMBER

Looks good to me @keewis. Thanks for your work on the indexes repr!

Yes I think we can skip displaying default indexes for now... The question is which indexes are considered as default, i.e., all PandasIndex and PandasMultiIndex instances (like in this PR) or just the single pandas indexes automatically created for the dimension coordinates? We can decide this later, though, it's not a problem adding more indexes in the text repr later (we'll probably need it when dropping the multi-index dimension coordinate with tuple elements). For the html repr it's easier: we could display all indexes and collapse the section by default.

but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment?

Yes that's a good point. Let's keep "dimensions without coordinates".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  display the indexes in the string reprs 1306887842
1272966573 https://github.com/pydata/xarray/issues/7139#issuecomment-1272966573 https://api.github.com/repos/pydata/xarray/issues/7139 IC_kwDOAMm_X85L3-2t benbovy 4160723 2022-10-10T08:35:22Z 2022-10-10T08:35:22Z MEMBER

Looks like the backend logic needs some updates to make it compatible with the new xarray data model with explicit indexes (i.e., possible indexed coordinates with name != dimension like for multi-index levels now), e.g., here:

https://github.com/pydata/xarray/blob/8eea8bb67bad0b5ac367c082125dd2b2519d4f52/xarray/backends/api.py#L234-L241

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 1400949778
1272944063 https://github.com/pydata/xarray/issues/7148#issuecomment-1272944063 https://api.github.com/repos/pydata/xarray/issues/7148 IC_kwDOAMm_X85L35W_ benbovy 4160723 2022-10-10T08:16:37Z 2022-10-10T08:16:37Z MEMBER

Looks like passing a pandas.MultiIndex object as dim argument to concat was forgotten during the explicit indexes refactor. While this can be fixed (could be tricky), we should deprecate it: it is convenient but probably too neat now that multi-indexes levels have their own, "real" coordinates (see https://github.com/pydata/xarray/issues/6293#issuecomment-1259228475). It should be preferred to explicitly chain concat with assign_coords (and set_index) like the last line in your example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate using Multiindex cannot be unstacked anymore 1402168223
1271555410 https://github.com/pydata/xarray/issues/7139#issuecomment-1271555410 https://api.github.com/repos/pydata/xarray/issues/7139 IC_kwDOAMm_X85LymVS benbovy 4160723 2022-10-07T12:55:17Z 2022-10-07T12:55:17Z MEMBER

Hi @lukasbindreiter, could you add the whole error traceback please?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 1400949778
1271519573 https://github.com/pydata/xarray/pull/7105#issuecomment-1271519573 https://api.github.com/repos/pydata/xarray/issues/7105 IC_kwDOAMm_X85LydlV benbovy 4160723 2022-10-07T12:20:49Z 2022-10-07T12:20:49Z MEMBER

Tests should be ok now, although this is not a super clean workaround. IndexVariable still needs some more refactoring anyway.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix to_index(): return multiindex level as single index 1390999159
1267580535 https://github.com/pydata/xarray/issues/7121#issuecomment-1267580535 https://api.github.com/repos/pydata/xarray/issues/7121 IC_kwDOAMm_X85Ljb53 benbovy 4160723 2022-10-04T21:08:20Z 2022-10-04T21:08:20Z MEMBER

Hi @veenstrajelmer,

In principle with the recent explicit indexes refactor there is no need anymore to have this restriction. Although we still need to relax this constraint (see #6293 point 2), hopefully this shouldn't be hard work now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add rename_variables argument to xr.open_dataset() to workaround vars with same names as dims 1395962467
1266073388 https://github.com/pydata/xarray/issues/7108#issuecomment-1266073388 https://api.github.com/repos/pydata/xarray/issues/7108 IC_kwDOAMm_X85Ldr8s benbovy 4160723 2022-10-03T21:28:43Z 2022-10-03T21:28:43Z MEMBER

I suppose re-projecting it on a 0-360 would be the only way around this specific issue.

A custom Xarray index would help, e.g., PeriodicBoundaryIndex (#7031) or a GeographicIndex leveraging libraries like S2Geometry or H3.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  .sel return errors when using floats for no apparent reason 1391699976
1266068474 https://github.com/pydata/xarray/pull/7105#issuecomment-1266068474 https://api.github.com/repos/pydata/xarray/issues/7105 IC_kwDOAMm_X85Ldqv6 benbovy 4160723 2022-10-03T21:22:42Z 2022-10-03T21:22:42Z MEMBER

Yes I agree it would be nice if we can roll back this breaking change. However, it really conflicts with .xindexes that returns the same index instance for each of its corresponding coordinate. This roll back seems to mostly break things where we need to be smart while handling multi-index coordinates passed to DataArray / Dataset constructors. This might be tricky to solve. It would probably be easier to do it after #6392.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix to_index(): return multiindex level as single index 1390999159
1265252754 https://github.com/pydata/xarray/issues/2028#issuecomment-1265252754 https://api.github.com/repos/pydata/xarray/issues/2028 IC_kwDOAMm_X85LajmS benbovy 4160723 2022-10-03T10:38:57Z 2022-10-03T16:45:35Z MEMBER

With the last release v2022.09.0, this is now possible via .set_xindex():

```python a = a.set_xindex("currency")

a.sel(currency="EUR")

<xarray.DataArray (country: 2)>

array([20, 30])

Coordinates:

* country (country) <U7 'Germany' 'France'

* currency (country) <U3 'EUR' 'EUR'

```

Closed in #6971 (although set_xindex still needs to be documented in the User Guide).

{
    "total_count": 9,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 5,
    "confused": 0,
    "heart": 3,
    "rocket": 1,
    "eyes": 0
}
  slice using non-index coordinates 309691307
1265012286 https://github.com/pydata/xarray/issues/7108#issuecomment-1265012286 https://api.github.com/repos/pydata/xarray/issues/7108 IC_kwDOAMm_X85LZo4- benbovy 4160723 2022-10-03T06:57:17Z 2022-10-03T06:57:17Z MEMBER

TBH, I had to do some research before figuring out what was going on :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  .sel return errors when using floats for no apparent reason 1391699976
1263548977 https://github.com/pydata/xarray/issues/7108#issuecomment-1263548977 https://api.github.com/repos/pydata/xarray/issues/7108 IC_kwDOAMm_X85LUDox benbovy 4160723 2022-09-30T13:03:26Z 2022-09-30T13:03:26Z MEMBER

It looks like the error is because of the non-monotonic coordinate labels for the "lon" coordinate in nc_bug rather than a float precision issue. The "lon" coordinate seems monotonic for nc_ok so it works.

When a slice is given as indexer, Xarray internally calls pandas.Index.slice_indexer(), which requires that the index must be ordered and unique (docs). Unfortunately, Pandas does not mention it while it raises a KeyError. Should we first check the index in Xarray and raise a nicer error message if it is not unique / ordered?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  .sel return errors when using floats for no apparent reason 1391699976
1262007838 https://github.com/pydata/xarray/issues/7075#issuecomment-1262007838 https://api.github.com/repos/pydata/xarray/issues/7075 IC_kwDOAMm_X85LOLYe benbovy 4160723 2022-09-29T09:20:59Z 2022-09-29T09:20:59Z MEMBER

What happens if you create Dataset objects fully in memory instead of loading data from files? Is there a significant slowdown when you increase the size of the Dataset dimensions?

Could you measure the time it takes at a more fined-grained level? I.e., loading files vs. extracting a slice vs. convert to dataframe. This would help better identifying the possible source of slowdown.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to pandas dataframe is much slower in newest xarray version 1384226112
1261998233 https://github.com/pydata/xarray/issues/7104#issuecomment-1261998233 https://api.github.com/repos/pydata/xarray/issues/7104 IC_kwDOAMm_X85LOJCZ benbovy 4160723 2022-09-29T09:12:54Z 2022-09-29T09:12:54Z MEMBER

Maybe we should check pandas.MultiIndex.is_unique in Dataset.unstack()

Better to check this in PandasMultiIndex.unstack() actually.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duplicate values on unstack 1390228572
1261996160 https://github.com/pydata/xarray/issues/7104#issuecomment-1261996160 https://api.github.com/repos/pydata/xarray/issues/7104 IC_kwDOAMm_X85LOIiA benbovy 4160723 2022-09-29T09:11:05Z 2022-09-29T09:11:05Z MEMBER

Thanks for the report @znichollscr.

Maybe we should check pandas.MultiIndex.is_unique in Dataset.unstack() like in Dataset.from_dataframe()?

```python df = ds.drop_vars("lat").to_dataframe()

xr.Dataset.from_dataframe(df)

ValueError: cannot convert a DataFrame with a non-unique MultiIndex into xarray

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duplicate values on unstack 1390228572
1261356747 https://github.com/pydata/xarray/issues/7069#issuecomment-1261356747 https://api.github.com/repos/pydata/xarray/issues/7069 IC_kwDOAMm_X85LLsbL benbovy 4160723 2022-09-28T19:12:50Z 2022-09-28T19:12:50Z MEMBER

I think we can go ahead with the release. The remaining regressions seem to affect only a limited number of use cases ; it could wait the following release if we we are not waiting too long between the two.

I'd also wait for an announcement about indexes. It has been already announced at the previous release, and it'd probably be better to communicate about it (maybe via a blog post?) after improving the docs and experimenting a bit more with custom indexes...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  release? 1382753751
1261049239 https://github.com/pydata/xarray/issues/7097#issuecomment-1261049239 https://api.github.com/repos/pydata/xarray/issues/7097 IC_kwDOAMm_X85LKhWX benbovy 4160723 2022-09-28T15:03:36Z 2022-09-28T15:03:36Z MEMBER

Hi @znichollscr, thanks for the report. Indeed it looks like _coord_names are not updated properly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broken state when using assign_coords with multiindex 1389148779
1261015002 https://github.com/pydata/xarray/issues/7099#issuecomment-1261015002 https://api.github.com/repos/pydata/xarray/issues/7099 IC_kwDOAMm_X85LKY_a benbovy 4160723 2022-09-28T14:39:10Z 2022-09-28T14:39:10Z MEMBER

Or use Indexer objects to group labels + options? This is slightly different than what you suggest:

```python class Dataset:

def sel(
    self,
    indexers: Mapping[Any, Any] | Indexer | Iterable[Indexer],
    **indexers_kwargs: Any,
):
    ...

class Indexer: def init(self, labels=None, options=None, **label_kwargs): ... ```

Let's assume a Dataset with lat / lon coordinates both sharing the same geographic index + another time dimension coordinate, then we could write:

```python indexers = [ Indexer(lon=[2, 15], lat=[45, 48], options={"foo": "bar"}), Indexer(time="2022-01-01"), ]

ds.sel(indexers) ```

This could also be used to avoid code duplication when using common selection options for different indexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass arbitrary options to sel() 1389295853
1260892017 https://github.com/pydata/xarray/issues/7099#issuecomment-1260892017 https://api.github.com/repos/pydata/xarray/issues/7099 IC_kwDOAMm_X85LJ69x benbovy 4160723 2022-09-28T13:11:01Z 2022-09-28T13:11:01Z MEMBER

Or we could simply decide that .sel() should not accept arbitrary options and handle special cases, e.g., via accessors.

It would actually make sense to have something like .my_accessor.sel_k_neighbors(). Not so great to have a separate method just for an optimization option, though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass arbitrary options to sel() 1389295853
1260618693 https://github.com/pydata/xarray/issues/6392#issuecomment-1260618693 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85LI4PF benbovy 4160723 2022-09-28T09:13:00Z 2022-09-28T12:52:01Z MEMBER

How would we handle creating xarray objects from pandas objects where they have a multiindex?

For pandas.Series / pandas.DataFrame objects, DataArray.from_series() / Dataset.from_dataframe() already expand multi-index levels as dimensions.

For a pandas.MultiIndex, we could do like below but it is a bit tedious:

```python import pandas as pd import xarray as xr from xarray.indexes import PandasMultiIndex

pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = PandasMultiIndex(pd_idx, "x")

indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables()

ds = xr.Dataset(coords=coords, indexes=indexes) ```

For more convenience, we could add a class method to PandasMultiIndex, e.g.,

```python

this calls PandasMultiIndex.init() and PandasMultiIndex.create_variables() internally

indexes, coords = PandasMultiIndex.from_pandas_index(pd_idx, "x")

ds = xr.Dataset(coords=coords, indexes=indexes) ```

Instead of indexes, coords raw dictionaries, we could return an instance of the Indexes class (also returned by Dataset.xindexes), which encapsulates the coordinate variables:

```python xmidx = PandasMultiIndex.from_pandas_index(pd_idx, "x")

ds = xr.Dataset(coords=xmidx.variables, indexes=xmidx) ```

For even more convenience, I think it might be reasonable to support special handling of Indexes instances given in Dataset / DataArray constructors and in .update(), i.e.,

```python

both cases below will implicitly add the coordinates found in xmidx

(if there's no conflict with other coordinates)

ds = xr.Dataset(indexes=xmidx)

ds2 = xr.Dataset() ds2.update(xmidx) ```

The same approach could be used for pandas.IntervalIndex (as discussed in #4579).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1260859023 https://github.com/pydata/xarray/issues/7099#issuecomment-1260859023 https://api.github.com/repos/pydata/xarray/issues/7099 IC_kwDOAMm_X85LJy6P benbovy 4160723 2022-09-28T12:50:25Z 2022-09-28T12:50:25Z MEMBER

Another difficulty regarding multi-coordinate indexes: ideally options should be set per index, not per coordinate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass arbitrary options to sel() 1389295853
1260806288 https://github.com/pydata/xarray/issues/4090#issuecomment-1260806288 https://api.github.com/repos/pydata/xarray/issues/4090 IC_kwDOAMm_X85LJmCQ benbovy 4160723 2022-09-28T12:06:03Z 2022-09-28T12:06:03Z MEMBER

@JimmyGao0204 this is not supported by Xarray itself but the xoak has been developed for that purpose.

I'm going to close this issue as Xarray now provides everything needed for selecting data using 2D lat/lon coordinates (i.e., advanced indexing, flexible indexes), and it is likely that this specific case will be further maintained in a 3rd party library like xoak. Feel free to comment / re-open if you think this should be built-in Xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error with indexing 2D lat/lon coordinates 623804131
1260794423 https://github.com/pydata/xarray/issues/475#issuecomment-1260794423 https://api.github.com/repos/pydata/xarray/issues/475 IC_kwDOAMm_X85LJjI3 benbovy 4160723 2022-09-28T11:55:04Z 2022-09-28T11:55:04Z MEMBER

There hasn't been much activity here since quite some time.

Meanwhile, there has been the development of the xoak package that supports point-wise indexing of Xarray objects with various indexes (either generic like scipy.spatial.cKDTree or more specific like pys2index's S2PointIndex for lat/lon point data). xoak leverage Xarray's advanced indexing capabilities and supports selection using both coordinates and indexers with an arbitrary number of dimensions.

With the forthcoming Xarray release, it will be possible to create and assign custom indexes to DataArray / Dataset objects. The plan for xoak is then to just provide some custom indexes so that we can perform point-wise selection directly with Dataset.sel() instead of Dataset.xoak.sel().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API design for pointwise indexing 95114700
1260551056 https://github.com/pydata/xarray/issues/6573#issuecomment-1260551056 https://api.github.com/repos/pydata/xarray/issues/6573 IC_kwDOAMm_X85LInuQ benbovy 4160723 2022-09-28T08:17:09Z 2022-09-28T08:17:09Z MEMBER

I also like the idea of alignment with some tolerance. There is an open PR #4489, which needs to be reworked in the context of the explicit index refactor.

Alternatively to a new kwarg we could add an index build option, e.g., ds.set_xindex("x", index_cls=PandasIndex, align_tolerance=1e-6), but then it is not obvious how to handle different tolerance values given for the indexes to compare. Maybe this could depend on the given join method? E.g., pick the smallest tolerance for join=inner, the largest for join=outer, the tolerance of the left index for join=left, etc.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  32- vs 64-bit coordinates coordinates in where() 1226272301
1260497579 https://github.com/pydata/xarray/issues/5874#issuecomment-1260497579 https://api.github.com/repos/pydata/xarray/issues/5874 IC_kwDOAMm_X85LIaqr benbovy 4160723 2022-09-28T07:26:55Z 2022-09-28T07:26:55Z MEMBER

Closed in #6971.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Need a way to speciefy the names of coordinates from the indices which droped by DataArray.reset_index. 1029088776
1259615513 https://github.com/pydata/xarray/issues/4579#issuecomment-1259615513 https://api.github.com/repos/pydata/xarray/issues/4579 IC_kwDOAMm_X85LFDUZ benbovy 4160723 2022-09-27T14:45:19Z 2022-09-27T14:46:41Z MEMBER

Perhaps Xarray has been too clever so far regarding how it handles pandas objects passed directly as coordinate data? pandas.MultiIndex objects are handled in a specific way too, which is often hard to deal with.

Expanding on @max-sixty's suggestion, we could:

  • treat all coordinate data as duck arrays, i.e., in the example above handle da1 just like da2 (no more special cases for pandas objects)
  • provide an xarray.indexes.PandasIntervalIndex wrapper, which would inherit from xarray.indexes.PandasIndex with a few addtionnal options and features, e.g., like the ones @dcherian suggests in https://github.com/pydata/xarray/discussions/6783#discussioncomment-3149033
  • build an interval index from an existing coordinate using , e.g., da.set_xindex("x", PandasIntervalIndex, closed="right")
  • figure out how to assign both a coordinate and an index from an existing pandas.IntervalIndex object in a convenient but more explicit way
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Invisible differences between arrays using IntervalIndex 741806260
1259441952 https://github.com/pydata/xarray/issues/5646#issuecomment-1259441952 https://api.github.com/repos/pydata/xarray/issues/5646 IC_kwDOAMm_X85LEY8g benbovy 4160723 2022-09-27T12:34:20Z 2022-09-27T12:34:20Z MEMBER

This is fixed in v2022.6.0

```python xr.testing.assert_allclose(b, c)

AssertionError: Left and right DataArray objects are not close

Coordinates only on the left object:

* x (z) int64 0

* y (z) int64 0

Coordinates only on the right object:

* not-y (z) int64 0

* not-x (z) int64 0

print(b == c, "\n")

ValueError: cannot re-index or align objects with conflicting indexes found for the following coordinates: 'z' (2 conflicting indexes)

Conflicting indexes may occur when

- they relate to different sets of coordinate and/or dimension names

- they don't have the same type

- they may be used to reindex data along common dimensions

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Level names in multi-level index are ignored 955617411
1259415933 https://github.com/pydata/xarray/issues/2280#issuecomment-1259415933 https://api.github.com/repos/pydata/xarray/issues/2280 IC_kwDOAMm_X85LESl9 benbovy 4160723 2022-09-27T12:12:05Z 2022-09-27T12:12:05Z MEMBER

This is fixed in v2022.6.0. Xarray's PandasMultiIndex wrapper keeps track of the level coordinate dtypes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  string coords are converted to object dtype when using MultiIndex / stacking 340316108
1259415318 https://github.com/pydata/xarray/issues/907#issuecomment-1259415318 https://api.github.com/repos/pydata/xarray/issues/907 IC_kwDOAMm_X85LEScW benbovy 4160723 2022-09-27T12:11:35Z 2022-09-27T12:11:35Z MEMBER

This is fixed in v2022.6.0. Xarray's PandasMultiIndex wrapper keeps track of the level coordinate dtypes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() treats string coords as objects 166441031
1259349072 https://github.com/pydata/xarray/pull/6971#issuecomment-1259349072 https://api.github.com/repos/pydata/xarray/issues/6971 IC_kwDOAMm_X85LECRQ benbovy 4160723 2022-09-27T11:14:07Z 2022-09-27T11:14:07Z MEMBER

In the last commit I added the xarray.indexes namespace from which we can import Index, PandasIndex and PandasMultiIndex.

Thanks everyone for the feedback and review!

I think this is ready to merge, if we agree to address the coord_names typing issue in another PR?

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add set_xindex and drop_indexes methods 1357296406

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 41.3ms · About: xarray-datasette