home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1295283938

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938 https://api.github.com/repos/pydata/xarray/issues/7214 1295283938 IC_kwDOAMm_X85NNHbi 1217238 2022-10-28T17:49:10Z 2022-10-28T17:49:10Z MEMBER

Explicitly providing indexes is an advanced user feature.

Agreed. However, xr.Dataset(coords={"x": pandas_midx}) is something that presumably a lot of users rely on (it is used extensively in Xarray's tests) and that we should really deprecate IMO. If we don't provide a convenient alternative, I expect many of those users will complain.

I agree -- we should support this for backwards compatibility (even if we deprecate it).

it's easier to explicitly manipulate indexes in the form of a dict

While generally I also prefer handling plain dict objects over custom dict-like objects, here I don't see much reasons of manipulating Xarray index objects independently of their coordinate variables. Indexes allows keeping them tied together, and it is already returned by .xindexes.

EDIT -- For more context: initially an Indexes object was almost equivalent to a Frozen(obj._indexes). In #5692 I tried hard and struggled to keep dealing with separate dicts of indexes and indexed variables, but in the end it made things much easier to encapsulate the variables in Indexes, which is also used internally in different places.

OK, this totally makes sense.

I don't love that it is possible to express invalid states in Xarray's data model. This motivated the creation of assert_internal_invariants and currently mostly is a concern for Xarray's own developers, but when we exposes the indexes argument, it will be easier for users to make the same sort of errors.

I wonder if we should consider the broader refactor of merging the Indexes and Coordinates objects, and expose the constructor as a public API. For clarity, I'll call it CoordinatesAndIndexes for now, but it could likely reuse the public name of Coordinates.

This would have a number of benefits:

  1. It's impossible to provide inconsistent coords and indexes, because there is no separate indexes argument.
  2. Likewise, it is impossible to create inconsistent coordinates and indexes on an existing Xarray object.
  3. All the logic for verifying consistent coords and indexes can go in one place, shared between Dataset/DataArray. (Yes, it would be annoying to refactor Dataset to merge in variables from CoordinatesAndIndexes rather than the current separate Dataset._variables)
  4. The public API also becomes clearer: if users want default indexes, they can pass a dict of variables into coords. If they want to copy indexes from another object, they can pass in a CoordinatesAndIndexes object (either from another Xarray object or constructed directly).
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1422543378
Powered by Datasette · Queries took 0.81ms · About: xarray-datasette