home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 945592260

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/5647#issuecomment-945592260 https://api.github.com/repos/pydata/xarray/issues/5647 945592260 IC_kwDOAMm_X844XJfE 4160723 2021-10-18T09:43:18Z 2021-10-18T10:47:26Z MEMBER

Ok, I'm now hitting another obstacle while working on reindex.

So one general approach for both alignment and re-indexing that is "pretty straightforward" to implement with the new Xarray index data model is: (1) find matching indexes based on their corresponding coordinate/dimension names and index type, and (2) call idx.join(other) and/or idx.reindex_like(other) where other is another index object of the same type than idx. This is what I've done so far in #5692.

Relaxing any of the constraints in (1) would be much more complicated to implement. We would need to do some sort of mapping from dimension labels to all involved (multi-/meta-)indexes, then check for conflicts in dimension indexers returned from multiple indexes, possibly handle/remove multi-index coordinates (or convert back to non-indexed coordinates), etc.

One problem with Dataset.reindex and DataArray.reindex is that we can pass any {dim: labels} as indexers. How should we adapt the API for flexible indexes? How to ensure backwards compatibility? The current possible cases are:

  1. Both dim and labels correspond (or may be cast) to single pandas indexes: there's no issue in this case with the constraints stated in (1) above.

  2. dim has a multi-index, labels is a pd.MultiIndex and level names exactly match: no real issue either in this case, but shouldn't we discourage this implicit behavior in the mid/long term and instead encourage using reindex_like for such more advanced use case?

  3. dim has a multi-index and labels is (cast to) a single pandas index (or the other way around): this is currently possible in Xarray but it seems silly? After re-indexing, all data along dim is filled with fill_value... Would it be fine to instead raise an error now? Would it really break any user case?

  4. dim has a multi-index, labels is a pd.MultiIndex and multi-index level names don't match: same problem than for case 3.

Cases 3 and 4 are a big obstacle for me right now. I really don't know how we can still support those special cases without deeply re-thinking the problem. If they could be considered as a bug, then the new implementation would already raise an nice error message :-).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  955936490
Powered by Datasette · Queries took 0.658ms · About: xarray-datasette