home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where author_association = "MEMBER", issue = 819062172 and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • benbovy · 12 ✖

issue 1

  • Flexible indexes refactoring notes · 12 ✖

author_association 1

  • MEMBER · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
801236829 https://github.com/pydata/xarray/pull/4979#issuecomment-801236829 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDgwMTIzNjgyOQ== benbovy 4160723 2021-03-17T16:42:40Z 2021-03-17T16:42:40Z MEMBER

shall we merge?

Yes! I wanted to wait for the bi-weekly dev meeting but I've just missed it (PST/PDT -> :facepalm: -> sorry).

Let's merge this and continue the discussion in follow-up issues/PRs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
796599469 https://github.com/pydata/xarray/pull/4979#issuecomment-796599469 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5NjU5OTQ2OQ== benbovy 4160723 2021-03-11T09:30:43Z 2021-03-11T09:30:43Z MEMBER

Thanks for your reviews @shoyer and @rabernat! I've updated the notes according to your comments.

From my point of view this is ready, but we can leave this PR open for another few days in case anyone else wants to add some comments (@pydata/xarray).

Next week I'll start with the implementation (I'll take on the open PR in https://github.com/pydata/xarray/projects/1 and will do only internal refactoring making sure that all tests are passing).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
792653797 https://github.com/pydata/xarray/pull/4979#issuecomment-792653797 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5MjY1Mzc5Nw== benbovy 4160723 2021-03-08T10:24:56Z 2021-03-08T10:35:01Z MEMBER

Thanks everyone for your comments so far!! They have been really helpful in improving the notes!

This is now ready for another round of review. I've tried to include all the points raised in the discussion above. I also marked all the conversations as resolved even though it's still open for discussion! (it's just a way to "reset" them for more clarity). I'll move the notes into a design_notes folder just before merging this PR.

With the last commits, I think that the notes now cover most of the aspects regarding the use of indexes in Xarray. The goal with these notes is not to settle every detail of the refactoring (decisions can be made while iterating on the implementation), but rather describe the big picture and outline the main opportunities and challenges. Referring to the notes will help throughout the implementation. Hopefully it will allow more Xarray users and devs sharing their point of views to make sure we're not missing anything important here.

One thing that is not in the notes: to which acceptable extent this refactoring may introduce breaking changes? I think that it will be hard to avoid any breaking change. That said, as the index refactoring would rather bring internal data structures to the light I don't expect many things to break (at least, not the things that 90% of Xarray users often rely on). Hardest part will probably be to ensure a smooth transition while updating the API that is too specific to pandas.MultiIndex into something that is more index-agnostic...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
791255540 https://github.com/pydata/xarray/pull/4979#issuecomment-791255540 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5MTI1NTU0MA== benbovy 4160723 2021-03-05T08:32:05Z 2021-03-05T08:32:05Z MEMBER

For reference for how rioxarray does things: https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html

That's good to know, thanks! Like it may create specific indexes for time coordinates, I could imagine Xarray's decode_cf(decode_coords=True) to eventually return some kind of CRSIndex from any variable referred to in grid_mapping attribute.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
790961332 https://github.com/pydata/xarray/pull/4979#issuecomment-790961332 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5MDk2MTMzMg== benbovy 4160723 2021-03-04T21:36:38Z 2021-03-04T21:36:38Z MEMBER

Ah yes, making it more easily reusable would be welcome indeed. I guess that such lazy arrays will be already needed for the creation of coordinates from the levels of an existing pandas.MultiIndex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
790716187 https://github.com/pydata/xarray/pull/4979#issuecomment-790716187 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5MDcxNjE4Nw== benbovy 4160723 2021-03-04T15:49:51Z 2021-03-04T15:51:53Z MEMBER

Are the CRSIndex and XgcmIndex examples really independent of any coordinate in the DataArray/Dataset? Looks like in #2996 a CRSIndex could be bound to x and y coordinates and a XgcmIndex could be bound to x, y, x_c, y_c, face, etc. coordinates?

@rabernat @jbusecke (xgcm) @snowman2 @fmaussion @djhoese (crs) it would be interesting to have your thoughts here.

What would be the pros and cons of:

  • Refactoring xgcm.Grid into a XgcmGridIndex? The index would typically be assigned to the Dataset coordinates that are also specified in the coords argument of xgcm.Grid and all other arguments (except the Dataset itself) would become index options. xgcm.Grid methods would then be accessible via Dataset accessor(s) (or eventually just replaced by xarray's corresponding methods).

  • Refactoring a crs attribute (either a "public" Dataset/DataArray attribute or hidden behind an accessor) into some CRSIndex that would typically be assigned to x/y or lat/lon coordinates?

A major advantage is that using a custom index, there's no need to encapsulate a Dataset/DataArray into a higher level structure (e.g., xgcm.Grid) and there would be more control on how it is propagated from one xarray object to another compared to an attribute or via a "stateful" accessor (e.g., crs). Another advantage is that Xarray selection and/or alignment can be customized. But that can be also a downside: unless we allow multiple indexes per coordinate, such XgcmGridIndex and CRSIndex would then have the responsibility of handling selection and alignment for all their corresponding coordinates. That may not be a big deal, though: XgcmGridIndex and CRSIndex could simply encapsulate pandas.Index instances for all (or a subset) of their coordinates.

Are there any other challenges and/or opportunities? (sorry, it has probably been already discussed elsewhere. There's too many places to look for :-) ).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
789910371 https://github.com/pydata/xarray/pull/4979#issuecomment-789910371 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4OTkxMDM3MQ== benbovy 4160723 2021-03-03T17:29:36Z 2021-03-03T17:29:36Z MEMBER

There are also high-level methods that could use indexes in non-trivial ways.

Thanks @dcherian for listing those methods here, that's something worth to keep in mind! I think that for now it would be reasonable to restrict those methods to the indexes that are currently available in Xarray instead of trying to extend the API of Xarray index wrappers in order to support those special cases. I guess it's ok for "default" or "common" xarray indexes to provide extra functionality that could not be implemented in 3rd party indexes, as well as it would be ok for 3rd-party indexes to provide non-standard, extra functionality that would be reused for methods implemented in DataArray/Dataset accessors.

Maybe this is a good definition for a PropertyIndex

Are the CRSIndex and XgcmIndex examples really independent of any coordinate in the DataArray/Dataset? Looks like in #2996 a CRSIndex could be bound to x and y coordinates and a XgcmIndex could be bound to x, y, x_c, y_c, face, etc. coordinates?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
789737571 https://github.com/pydata/xarray/pull/4979#issuecomment-789737571 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4OTczNzU3MQ== benbovy 4160723 2021-03-03T14:05:19Z 2021-03-03T14:17:19Z MEMBER

Alignment hasn't been discussed yet here, but it should! Some quick thoughts:

  • support for alignment should probably be optional for an Xarray index wrapper.
  • like pandas.Index, the index wrapper classes that support it should implement .equals(), .union() and/or .intersection()
  • support might be partial if that makes sense (outer, inner, left, right, exact...).
  • index equality might involve more than just the labels, like the CRSIndex proposed in #2996
  • some indexes might implement inexact alignment, like in #4489 or a KDTree index that selects nearest-neighbors within a given tolerance
  • alignment may be "multi-dimensional", i.e., the KDTree example above vs. dimensions aligned independently of each other
  • we need to decide what to do when one dimension has more than one index that supports alignment
  • we should probably raise unless the user explicitly specify which index to use for the alignment
  • we need to decide what to do when one dimension has one or more index(es) but none support alignment
  • either we raise or we fail back (silently) to alignment based on dimension size
  • for inexact alignment, the tolerance threshold might be given when building the index and/or when performing the alignment
  • are there cases where we want a specific index to perform alignment and another index to perform selection? It would be tricky to support that unless we allow multiple indexes per coordinate. "Meta" indexes (https://github.com/pydata/xarray/pull/4979#discussion_r585203065) would help but then I'm worried about the possible explosion of index wrapper classes.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
789549939 https://github.com/pydata/xarray/pull/4979#issuecomment-789549939 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4OTU0OTkzOQ== benbovy 4160723 2021-03-03T08:54:33Z 2021-03-03T08:54:33Z MEMBER

One use-case motivated question: the flexible indexes refactoring has also been pointed to as the resolution to #2233, where multidimensional coordinates have the same name as one of their dimensions. I wasn't quite able to tell through the narrative here if that has been addressed along the way yet or not ("A. only 1D coordinates with a name matching their dimension name" for implicit index creation does seem to get close though). So, would it be worth directly addressing #2233 here, or should that wait?

I think #2233 will be addressed by the index refactoring here. I don't see any issue with multidimensional coordinates having the same name as one of their dimensions once indexes are decoupled from dimensions/coordinates. I might still be missing something, though.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
789547340 https://github.com/pydata/xarray/pull/4979#issuecomment-789547340 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4OTU0NzM0MA== benbovy 4160723 2021-03-03T08:50:11Z 2021-03-03T08:50:11Z MEMBER

For the implementation: hooks would probably work. Other options might be decorator functions or context managers?

Or similarly to _repr_inline_:

```python class MyDuckArray: ...

def _sel_(self, indexer):
    """Prepare the label-based indexer to conform to this coordinate array."""
    ...
    return new_indexer

...

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
788870926 https://github.com/pydata/xarray/pull/4979#issuecomment-788870926 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4ODg3MDkyNg== benbovy 4160723 2021-03-02T12:23:41Z 2021-03-02T12:23:41Z MEMBER

One thing I'm missing is duckarray support, though. Not sure if this is realistic, but I'm hoping to reduce the maintenance burden on duckarray support libraries (such as pint-xarray) as much as possible: subclassing every new index class (or having the index provider explicitly add duckarray support) seems a bit too much work.

I haven't looked much at pint-xarray yet, so I'm not sure to understand. Why would you need to subclass every new index class?

If you are referring to the issue that you describe in your comment https://github.com/pydata/xarray/issues/525#issuecomment-514805244, the refactoring should decouple indexes from the coordinates, leaving the latter "just" as if they were regular variables (thus with duckarray support). What is currently possible with non-index coordinates should be possible with all coordinates. Actually, I'm not sure that we'll need to keep IndexVariable after the refactoring.

Or maybe you're referring to unit-aware indexing (what @shoyer mentioned in https://github.com/pydata/xarray/issues/525#issuecomment-514880353)? In this case I'm not sure how we could do that without having specific index classes for that purpose. Maybe some pre/post indexing hooks in Xarray that could be used, e.g., to convert indexer units into the coordinate units?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
788702853 https://github.com/pydata/xarray/pull/4979#issuecomment-788702853 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc4ODcwMjg1Mw== benbovy 4160723 2021-03-02T08:01:19Z 2021-03-02T08:01:19Z MEMBER

Thanks for your comments @keewis and @shoyer!

I think it's better for now to keep having this discussion and these notes in the Xarray repository, for more visibility. We could still move this elsewhere later if this PR becomes too cluttered, as there are potentially many aspects we can discuss about.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 24.131ms · About: xarray-datasette