html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/6975#issuecomment-1247157234,https://api.github.com/repos/pydata/xarray/issues/6975,1247157234,IC_kwDOAMm_X85KVhvy,35968931,2022-09-14T18:35:34Z,2022-09-14T18:37:48Z,MEMBER,"> We should clarify that the aim of Index objects is to make more efficient all the operations made in the (discrete or continuous) space defined by the coordinate labels. That space is distinct from the discrete space defined by array element locations.
I think this should be one of the first things said. It defines what all the following discussion of Indexes does and does not affect.
> I've tried to explain it in the ""Index base class"" section and the sections below, but maybe it should be emphasized more?
Yeah I think you do actually have that one covered, I just included it as another example of a naive question that everyone will have that is worth heading off very explicitly.
> When I load the ""air"" tutorial data and it shows a Float64Index and DateTime64Index, where did they come from?
>
> I guess you mean it is shown through `ds.indexes`?
>
> ds.xindexes (vs. ds.indexes) still needs to be added in the docs (in a later PR?), which hopefully will address your concern here.
I meant like when did these indexes get automatically built? (Presumably on coordinate assignment)
> Maybe Index also deserves its own entry there, where we could explain what indexes are, how they are different from variables (coordinates), how they are used or accessed in Xarray, etc.
1000% yes we need a page that explains what `Index` objects are, what they do, and how they work, and how they are handled automatically by default. This is pre-requisite knowledge (which apparently I don't have :sweat_smile: ) before trying to build your own custom index.
> Overall, I think that the whole ""Xarray Internals"" section could be streamlined beyond a bunch of loosely-coupled document pages.
Probably, but having a loosely coupled page for each aspect of the internals would be a good initial aim.
> I agree that we need more examples, but I also think that too much examples may tend to make things more confused.
That's why I like the ""Explanation"" vs ""How-to"" vs ""Tutorials"" distinction: use minimal code in the ""Explanation"" section (this PR) but put multiple more complex examples under ""How to create a functionally-derived index"", ""how-to create a lazy index"" etc.
> Is it possible to do that with Sphinx / RST?
No idea, but that does look cool!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1358841264
https://github.com/pydata/xarray/pull/6975#issuecomment-1246487152,https://api.github.com/repos/pydata/xarray/issues/6975,1246487152,IC_kwDOAMm_X85KS-Jw,4160723,2022-09-14T09:25:24Z,2022-09-14T09:25:24Z,MEMBER,"Thanks @dcherian and @TomNicholas for your feeback!
@dcherian I will reply to your inline comments when I'll integrate your suggestions in this PR.
@TomNicholas I answer to your comments below.
> Bear in mind I don't think I've ever contributed a PR to xarray that touched indexes.py or indexing.py
That's exactly why your feedback is valuable!
> When is my CustomIndex object consulted? Is it potentially for all basic operations (concat, join, align, indexing, etc?)
I agree this could be detailed more in the Index API docstrings in a consistent way. For some methods like `equals`, `join` and `reindex_like` it could be called in *a lot* of places, basically everything that relies on object alignment.
> Why does there not need to be an index (in .indexes) if I do indexing with e.g. .isel but have no coordinates?
We should clarify that the aim of `Index` objects is to make more efficient all the operations made in the (discrete or continuous) space defined by the coordinate labels. That space is distinct from the discrete space defined by array element locations. All operations made in the latter space don't require any index.
Some Index API like `Index.isel` suggest otherwise, but those methods are rather for convenience, i.e., avoid users having to rebuild an index from scratch when it could be easily built from the existing one.
> How is an xarray.PandasIndex different from a pd.Index?
I've tried to explain it in the ""Index base class"" section and the sections below, but maybe it should be emphasized more?
> When I load the ""air"" tutorial data and it shows a Float64Index and DateTime64Index, where did they come from?
I guess you mean it is shown through `ds.indexes`?
`ds.xindexes` (vs. `ds.indexes`) still needs to be added in the docs (in a later PR?), which hopefully will address your concern here.
> Finally it's not great that to explain Index objects we have to assume the user knows what xarray.Variable is, but Variable is still not really public API, and certainly isn't documented as comprehensively as DataArray and Dataset are.
I agree, although `Variable` is already documented in the ""Xarray internals"" section. Maybe `Index` also deserves its own entry there, where we could explain what indexes are, how they are different from variables (coordinates), how they are used or accessed in Xarray, etc.
Overall, I think that the whole ""Xarray Internals"" section could be streamlined beyond a bunch of loosely-coupled document pages.
> I also think we need multiple simple examples.
I agree that we need more examples, but I also think that too much examples may tend to make things more confused.
One thing that I like very much in https://fastapi.tiangolo.com/ is how a small example is picked for each tutorial and then is shown by highlighting the relevant code for every subsection. Is it possible to do that with Sphinx / RST?
It's hard to show all features through one succinct example, though. Like @dcherian says in https://github.com/pydata/xarray/pull/6975#discussion_r967495773, we could invite people to look into the `PandasIndex` and `PandasMultiIndex` code for more details. My hope is that there will be more real examples (multi-coordinate, multi-dimensions) available in the future.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1358841264
https://github.com/pydata/xarray/pull/6975#issuecomment-1246192701,https://api.github.com/repos/pydata/xarray/issues/6975,1246192701,IC_kwDOAMm_X85KR2Q9,35968931,2022-09-14T03:41:06Z,2022-09-14T03:43:31Z,MEMBER,"> **pydata/xarray** your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation.
I find the way in which the Index objects are involved in a method call (e.g. `sel`) pretty opaque still. (Bear in mind I don't think I've ever contributed a PR to xarray that touched `indexes.py` or `indexing.py` :sweat_smile:)
- When is my `CustomIndex` object consulted? Is it potentially for all basic operations (concat, join, align, indexing, etc?) If I passed some kind of `NotImplementedIndex` which _only_ defined `.from_variables`, what functionality would be left in xarray?
- Why does there not need to be an index (in `.indexes`) if I do indexing with e.g. `.isel` but have no coordinates?
- If I index along two dimensions simultaneously (`.isel(x=1, y=2)`) does that correspond to two separate Index consultations?
- How is an `xarray.PandasIndex` different from a `pd.Index`?
- When I load the ""air"" tutorial data and it shows a `Float64Index` and `DateTime64Index`, where did they come from?
- Finally it's not great that to explain `Index` objects we have to assume the user knows what `xarray.Variable` is, but `Variable` is still not really public API, and certainly isn't documented as comprehensively as `DataArray` and `Dataset` are.
Perhaps these questions are too specific to xarray's internals but I do think there should be some kind of mental model given as to the role the Index objects play. (This could be a white lie, similar to how our page on data structures says that `Dataset` objects contain `DataArray` objects when actually they technically don't.)
I also think we need multiple simple examples. How about
- `HelloWorldIndex`, that just prints `""I'm calling HelloWorldIndex.isel !""` etc.
- `PeriodicBoundaryIndex` (a partial/simpler implementation of #7031),
- A simple functionally-derived index, that consults a dynamically-called exponential function or something.
- Some example of a custom multi-index, perhaps some kind of 2D lat-lon thing? Or how about something that represents 2D image distortion, like using it creates a fisheye effect?
I find it helps to think of documentation using [this 4-part system](https://documentation.divio.com/). This PR should cover ""Explanation"" pretty well, but we should still aim for other content to better cover ""Tutorial"", ""How-to Guides"", and ""Reference"". ""Tutorial"" could be like a notebook walking through creating a simple index (e.g. `PeriodicIndex`) from scratch, explaining and fixing errors as they arise (like @dcherian did for `apply_ufunc`). ""How-to Guides"" might be specific to other more advanced custom index examples. I guess ""Reference"" here is just having really clear docstrings on the possible methods of `Index` somewhere (we can't really do that with an ABC though can we?).
That all said, this is already a great start!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1358841264
https://github.com/pydata/xarray/pull/6975#issuecomment-1242492777,https://api.github.com/repos/pydata/xarray/issues/6975,1242492777,IC_kwDOAMm_X85KDu9p,2448579,2022-09-09T21:24:42Z,2022-09-09T21:24:42Z,MEMBER,"> an inefficient ""numpy"" index with basic lookup
yes! I used this recently to describe what an index does. I think most people are familiar with the argmin way","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1358841264
https://github.com/pydata/xarray/pull/6975#issuecomment-1239526376,https://api.github.com/repos/pydata/xarray/issues/6975,1239526376,IC_kwDOAMm_X85J4avo,4160723,2022-09-07T15:15:43Z,2022-09-07T15:15:43Z,MEMBER,"> I'm open to any suggestion on how to better illustrate this with clear and succinct examples.
Maybe an inefficient ""numpy"" index with basic lookup (like in https://github.com/pydata/xarray/pull/3925#issuecomment-609471635) would be a good example?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1358841264