home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

44 rows where author_association = "MEMBER" and issue = 966983801 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 9

  • benbovy 24
  • shoyer 6
  • keewis 6
  • dcherian 3
  • rabernat 1
  • jhamman 1
  • max-sixty 1
  • andersy005 1
  • Illviljan 1

issue 1

  • Explicit indexes · 44 ✖

author_association 1

  • MEMBER · 44 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1071581658 https://github.com/pydata/xarray/pull/5692#issuecomment-1071581658 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_3wna max-sixty 5635139 2022-03-17T21:57:11Z 2022-03-17T21:57:11Z MEMBER

Huge success! Thank you @benbovy !

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1071148047 https://github.com/pydata/xarray/pull/5692#issuecomment-1071148047 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_2GwP jhamman 2443309 2022-03-17T17:52:24Z 2022-03-17T17:52:24Z MEMBER

Huge! Amazing effort here @benbovy!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1071104882 https://github.com/pydata/xarray/pull/5692#issuecomment-1071104882 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_18Ny shoyer 1217238 2022-03-17T17:12:07Z 2022-03-17T17:12:07Z MEMBER

OK, in it goes! Big thanks to @benbovy for seeing this through :)

{
    "total_count": 24,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 13,
    "confused": 0,
    "heart": 1,
    "rocket": 10,
    "eyes": 0
}
  Explicit indexes 966983801
1070332684 https://github.com/pydata/xarray/pull/5692#issuecomment-1070332684 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_y_sM dcherian 2448579 2022-03-17T05:10:50Z 2022-03-17T05:10:50Z MEMBER

Yes!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1069677202 https://github.com/pydata/xarray/pull/5692#issuecomment-1069677202 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_wfqS keewis 14808389 2022-03-16T22:00:53Z 2022-03-16T22:00:53Z MEMBER

sounds good to me!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1069344000 https://github.com/pydata/xarray/pull/5692#issuecomment-1069344000 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_vOUA shoyer 1217238 2022-03-16T16:47:45Z 2022-03-16T16:47:45Z MEMBER

OK, I think we’re good to go here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1065066894 https://github.com/pydata/xarray/pull/5692#issuecomment-1065066894 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_e6GO keewis 14808389 2022-03-11T12:24:32Z 2022-03-11T12:24:32Z MEMBER

the doc build actually fails because nbconvert relied on the implicit dependency of nbformat on ipython_genutils, but that got removed in the most recent release of nbformat. This is fixed in #6350.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1065045526 https://github.com/pydata/xarray/pull/5692#issuecomment-1065045526 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_e04W benbovy 4160723 2022-03-11T11:55:33Z 2022-03-11T11:55:33Z MEMBER

Ok, thanks @shoyer and @keewis for your comments!

The two failing CI checks are unrelated (see https://github.com/spatialaudio/nbsphinx/issues/639 for the doc build).

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Explicit indexes 966983801
1059662347 https://github.com/pydata/xarray/pull/5692#issuecomment-1059662347 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84_KSoL shoyer 1217238 2022-03-05T03:05:36Z 2022-03-05T03:05:36Z MEMBER

I would like to merge this PR very soon so it can get testing before the next release. If anyone has any remaining concerns, please speak up!

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1048550766 https://github.com/pydata/xarray/pull/5692#issuecomment-1048550766 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-f51u benbovy 4160723 2022-02-23T08:36:53Z 2022-02-23T08:36:53Z MEMBER

Something I was worried about was the number of open PRs that would need to be updated after this one gets merged eventually. I quickly went through the most recent PRs and it seems to be ok, although there may be some PRs that I overlooked.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1047244743 https://github.com/pydata/xarray/pull/5692#issuecomment-1047244743 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-a6_H benbovy 4160723 2022-02-21T21:35:26Z 2022-02-21T21:35:26Z MEMBER

Thanks for the review @Illviljan! I addressed most of your comments.

Regarding the use of Literal, join options and {"raise", "ignore"} option values are used in several places, directly or indirectly related to this PR. I would suggest to take care of this in a follow-up PR.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1044173017 https://github.com/pydata/xarray/pull/5692#issuecomment-1044173017 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-PNDZ benbovy 4160723 2022-02-18T08:59:19Z 2022-02-18T08:59:19Z MEMBER

The benchmark triggers an error if it is at least 1.5x slower than main. If I remember correctly I believe unstacking.UnstackingSparse.time_unstack_to_sparse_2d has been around 1.45 in tests that have passed. So that one might be a little slower now but I wouldn't worry about that too much.

Ah now I see that it is likely because this branch makes a copy of the pandas index (+ a second one that is not necessary and that I'll fix). Although making a shallow copy of a pandas index is rather cheap, it significantly affects the benchmark results here because of the small size of the index. For larger index sizes it becomes irrelevant:

This branch:

  • size = 100

```python

%timeit da_eye_2d.unstack(sparse=True) 680 µs ± 3.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ```

  • size = 100 000

```python

%timeit da_eye_2d.unstack(sparse=True) 49.2 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ```

Xarray stable 0.21.1 (with same versions of numpy, pandas and sparse):

  • size = 100

```python

%timeit da_eye_2d.unstack(sparse=True) 424 µs ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ```

  • size = 100 000

```python

%timeit da_eye_2d.unstack(sparse=True) 49.4 ms ± 797 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1043927304 https://github.com/pydata/xarray/pull/5692#issuecomment-1043927304 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-OREI Illviljan 14371165 2022-02-18T05:33:19Z 2022-02-18T05:33:19Z MEMBER

I’m not sure what’s happening with the failing benchmark test unstacking.UnstackingSparse.time_unstack_to_sparse_2d, it seems flaky (or I missed something when trying to profile this benchmark + setup).

The benchmark triggers an error if it is at least 1.5x slower than main. If I remember correctly I believe unstacking.UnstackingSparse.time_unstack_to_sparse_2d has been around 1.45 in tests that have passed. So that one might be a little slower now but I wouldn't worry about that too much.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1042909091 https://github.com/pydata/xarray/pull/5692#issuecomment-1042909091 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-KYej benbovy 4160723 2022-02-17T12:40:31Z 2022-02-17T12:41:06Z MEMBER

Ok, this is ready for review!

I’m not sure what’s happening with the failing benchmark test unstacking.UnstackingSparse.time_unstack_to_sparse_2d, it seems flaky (or I missed something when trying to profile this benchmark + setup).

I haven’t updated the release notes yet as I’m not sure how best to proceed. I guess separate items should be added for each substantial change, but maybe not for all the fixes?

There’s still a number of Dataset methods (and DataArray counterparts) that may not be (fully) compatible with custom indexes (especially multi-coordinate / multi-dimensional indexes). After a quick check:

drop_sel, drop_isel, drop_dims, transpose, interpolate_na, ffill, bfill, reduce, map, apply, quantile, rank, integrate, cumulative_integrate, filter_by_attrs, idxmin, idxmax, argmin, argmax...

I haven’t looked more into those methods as this PR is already way too big and the current tests are passing. This is something that we could do progressively I think. Some of those methods may be pretty straightforward to refactor (e.g., just pass through custom indexes instead of re-computing default ones, when this is possible). It may be trickier for other methods, though.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
  Explicit indexes 966983801
1041843391 https://github.com/pydata/xarray/pull/5692#issuecomment-1041843391 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-GUS_ keewis 14808389 2022-02-16T16:23:56Z 2022-02-16T16:23:56Z MEMBER

For arr.roll, the index is dropped if roll_coords=True and CustomIndex.roll() is not implemented.

right, I can confirm that, too

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1041835577 https://github.com/pydata/xarray/pull/5692#issuecomment-1041835577 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-GSY5 benbovy 4160723 2022-02-16T16:16:30Z 2022-02-16T16:16:30Z MEMBER

let's try it:

arr.roll(x=5): index is not dropped, the result for sel(indexers) is different from before the roll (I guess that makes sense? pandas indexes behave the same way...) copy: raises NotImplementedError arr.stack(z=("x", "y")): index is dropped arr.isel(x=slice(5, 10), y=slice(10, 20)): index is dropped

For arr.roll, the index is dropped if roll_coords=True and CustomIndex.roll() is not implemented. Otherwise, the index is kept as-is since the coordinates are not affected.

For copy, we should probably use copy.copy() by default instead of raising an error.

I don't have ideas on how to test the other operations right now.

Yeah this is something that we can do progressively as we're experimenting with custom indexes. But this first try is already great.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1041820244 https://github.com/pydata/xarray/pull/5692#issuecomment-1041820244 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-GOpU benbovy 4160723 2022-02-16T16:01:56Z 2022-02-16T16:01:56Z MEMBER

In df31fa8 I have re-enabled the default indexes invariant check, which is now optional (enabled by default).

The changes in behavior in this PR (mostly related to rename and reset_index) can thus be tracked looking at the check_default_indexes=False option set for assert_* functions in the tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1041781329 https://github.com/pydata/xarray/pull/5692#issuecomment-1041781329 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-GFJR keewis 14808389 2022-02-16T15:46:18Z 2022-02-16T15:56:59Z MEMBER

I wonder how it goes when applying other operations for which the custom index has no implementation.

let's try it: - arr.roll(x=5): index is not dropped, the result for sel(indexers) is different from before the roll (I guess that makes sense? pandas indexes behave the same way...) - copy: raises NotImplementedError - arr.stack(z=("x", "y")): index is dropped - arr.isel(x=slice(5, 10), y=slice(10, 20)): index is dropped

I don't have ideas on how to test the other operations right now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1041655034 https://github.com/pydata/xarray/pull/5692#issuecomment-1041655034 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84-FmT6 benbovy 4160723 2022-02-16T15:25:40Z 2022-02-16T15:25:40Z MEMBER

@keewis great! I actually haven't tried it yet, so it's good to know. I wonder how it goes when applying other operations for which the custom index has no implementation. It should either drop the index, propagate the index or raise an error depending on the cases.

Formatting aside, the most important issue I had was passing the custom index to DataArray, which at the moment requires the use of fastpath=True (or am I missing something?)

Yes, there's no public API yet for setting (custom) indexes explicitly. This can be done in a follow-up PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1040002844 https://github.com/pydata/xarray/pull/5692#issuecomment-1040002844 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X849_S8c benbovy 4160723 2022-02-15T08:39:39Z 2022-02-15T10:28:51Z MEMBER

ALL GREEN! 😱

Yeah it's good to see all those green checks :)

if you find any tests where the behavior changed so much that it is tricky to fix them I'd be fine with xfailing / skipping them for now.

Some broken tests in test_units actually helped finding issues that are not directly related to those tests!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1039753848 https://github.com/pydata/xarray/pull/5692#issuecomment-1039753848 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X849-WJ4 dcherian 2448579 2022-02-15T01:28:41Z 2022-02-15T01:28:41Z MEMBER

ALL GREEN! 😱

{
    "total_count": 4,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 4,
    "eyes": 0
}
  Explicit indexes 966983801
1039079452 https://github.com/pydata/xarray/pull/5692#issuecomment-1039079452 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X8497xgc keewis 14808389 2022-02-14T13:20:18Z 2022-02-14T13:20:18Z MEMBER

thanks, @benbovy, those changes look good to me for now. After this is merged I'll have to update the tests anyways.

In the meantime, if you find any tests where the behavior changed so much that it is tricky to fix them I'd be fine with xfailing / skipping them for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1038881658 https://github.com/pydata/xarray/pull/5692#issuecomment-1038881658 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X8497BN6 benbovy 4160723 2022-02-14T09:56:36Z 2022-02-14T10:16:17Z MEMBER

@keewis FYI in the two last commits (11bf070 and 35a7464) I've made some changes to fix failing tests in test_units.py, which are related to the implicit creation of indexes and coordinates from a multi-index given in Dataset or DataArray constructors.

That may not be a proper fix. The problem is quite tricky though, since multi-indexes may be passed - directly or wrapped in DataArrays (as data or via their coordinates) - either to data_vars or coords (it may even be passed to both at the same time). It gets even more complicated now that multi-indexes in DataArrays have multiple coordinates. That's a lot of implicit behavior that IMHO we should depreciate as soon as possible in favor of a more explicit way to pass (multi-)indexes to the Dataset and DataArray constructors.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1034408563 https://github.com/pydata/xarray/pull/5692#issuecomment-1034408563 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X849p9Jz andersy005 13301940 2022-02-10T01:58:14Z 2022-02-10T01:58:35Z MEMBER

We'll also need to keep an eye on the asv benchmarks to ensure that there's no major regression in performance (I haven't checked yet).

I enabled the asv-benchmarks workflow via the run-benchmark label. Feel free to turn this off (by removing the label) if the PR isn't ready :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1034340215 https://github.com/pydata/xarray/pull/5692#issuecomment-1034340215 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X849psd3 benbovy 4160723 2022-02-10T00:12:35Z 2022-02-10T00:12:35Z MEMBER

Thanks @rabernat (and others), I really appreciate your encouragement!

I think I start to see the light at the end of the tunnel here :). I have now refactored all the main parts of Xarray's internals that are using indexes and I managed to bring the failing tests down to a few. Now I'm going to try clearing all CI failures before working on the remaining TODO items. Although this is still a draft PR, it is now mostly ready for review I think. I'll mark it as ready when all CI is green.

Some of the re-implementations might still be improved, like align and concat. It's hard to know how best to do so before experimenting with custom indexes, though. We can still further iterate on this later...

IMO the best way to help will be to heavily test this branch (or the main branch once it is merged) against some real-world cases, as this PR will certainly introduce regressions that are not caught by the tests. A few release candidates might help in gathering feedback from the broader Xarray community. We'll also need to keep an eye on the asv benchmarks to ensure that there's no major regression in performance (I haven't checked yet).

This PR also intentionally introduces some minor changes in behavior (see my previous comments). I think those changes are welcome as they fix things that we can rather consider as bugs or limitations. That said, it is important to carefully check that they won't break Xarray users' (good or bad) habits.

And of course, once it is ready I'm eager to help anyone experiment with this feature (custom indexes)! I have a few ideas on convenient ways to implement custom selection / alignment behavior while mostly reusing the current Xarray indexing capabilities based on pandas indexes.

{
    "total_count": 6,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
1033779138 https://github.com/pydata/xarray/pull/5692#issuecomment-1033779138 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X849njfC rabernat 1197350 2022-02-09T13:47:43Z 2022-02-09T13:47:43Z MEMBER

Just chiming in to say 💪 ! We see the work you are putting in @benbovy. I'm so excited to be using this feature. Is there a way I can help?

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
957960538 https://github.com/pydata/xarray/pull/5692#issuecomment-957960538 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X845GVFa dcherian 2448579 2021-11-02T17:15:58Z 2021-11-02T17:15:58Z MEMBER

@benbovy Thank you! I think everyone appreciates your hard work here, and also how frustrating this kind of work is :)

{
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
957363773 https://github.com/pydata/xarray/pull/5692#issuecomment-957363773 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X845EDY9 benbovy 4160723 2021-11-02T11:45:57Z 2021-11-02T11:45:57Z MEMBER

I really appreciate your efforts here! This is a giant refactoring PR. It's very important work but I know it can be hard to land.

Thanks @shoyer! On a positive note (starting a new week after a long week-end helps), lots of important tasks are done now, there has been good progress.

I wonder if there are things we could do to make this easier to land -- maybe we could defer some of these fixes to a later PR, if we wait on disabling implicit indexes?

We'll need to make all tests pass, but yes I'll try to see if we can provide some temporary (dirty) fixes that at least work under the current (legacy) data model (dimension coordinates with pandas indexes).

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
954003380 https://github.com/pydata/xarray/pull/5692#issuecomment-954003380 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X8443O-0 shoyer 1217238 2021-10-28T16:23:08Z 2021-10-28T16:23:08Z MEMBER

I don't know when it will stop. I'm feeling exhausted right now :(.

I really appreciate your effortss here! This is a giant refactoring PR. It's very important work but I know it can be hard to land.

I wonder if there are things we could do to make this easier to land -- maybe we could defer some of these fixes to a later PR, if we wait on disabling implicit indexes?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
953989525 https://github.com/pydata/xarray/pull/5692#issuecomment-953989525 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X8443LmV benbovy 4160723 2021-10-28T16:06:20Z 2021-10-28T16:06:20Z MEMBER

I thought I was almost done with the internal refactoring. However, the more I fix broken tests -> the more I discover new parts to refactor (added concat, computation, copy, expand_dims and groupby to the list recently...) -> which when done breaks other (sometimes unrelated) tests -> new things to refactor, etc. etc.

I don't know when it will stop. I'm feeling exhausted right now :(.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
925987285 https://github.com/pydata/xarray/pull/5692#issuecomment-925987285 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X843MXHV benbovy 4160723 2021-09-23T16:52:10Z 2021-09-23T16:52:10Z MEMBER

Yep I've seen those issues.

There's a lot of things going on in #3438. I wonder if the goal of chaining rename_dims, set_index, stack, etc. in the example shown in that issue wasn't ultimately to circumvent the limitation of dimension coordinates, which is not necessary anymore with explicit indexes. More particularly, neither rename_dims nor stack should cause drop-in replacement of other variables as a side effect in this PR.

Regarding #4108, I think this is not necessary anymore with explicit indexes. Since indexes are allowed for non-dimension coordinates too, we can (and it's perhaps more consistent) to preserve any existing index before/after renaming.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
925966420 https://github.com/pydata/xarray/pull/5692#issuecomment-925966420 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X843MSBU keewis 14808389 2021-09-23T16:23:05Z 2021-09-23T16:24:58Z MEMBER

FYI, this was forbidden in #3645

The discussion for that is in #3438. In short, in addition to renaming, rename* did replace other variables, which I think we should only allow when explicitly requested (e.g. by requiring replace=True), regardless of what it does with indexes. It also contained a bug related to the indexes, which I think @dcherian was trying to fix in #4108.

In any case, it would really be good to come up with a consistent API in #4825, and I agree that the index refactor seems like a good time for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
925965795 https://github.com/pydata/xarray/pull/5692#issuecomment-925965795 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X843MR3j benbovy 4160723 2021-09-23T16:22:15Z 2021-09-23T16:22:15Z MEMBER

To properly test the changes in behavior detailed above, we need to also check for the indexes when comparing two xarray objects. I opened #5812 to discuss about that issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
925718593 https://github.com/pydata/xarray/pull/5692#issuecomment-925718593 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X843LVhB benbovy 4160723 2021-09-23T11:16:18Z 2021-09-23T11:16:18Z MEMBER

In the last commits I refactored stack / unstack.

One change is that a multi-index is not always created with stack. It is created only if each of the dimensions to stack together have one and only one coordinate with a pandas index (this could be a non-dimension coordinate).

This could maybe address #5202, since we could simply drop the indexes before stacking the dimensions in order to avoid the creation of a multi-index. I don't think it's a big breaking change either unless there are users who rely on default multi-indexes with range (0, 1, 2...) levels. Looking at #5202, however, those default multi-indexes seem more problematic than something really useful, but I might be wrong here. Also, range-based indexes can still be created explicitly before stacking the dimensions if needed.

Another consequence is that stack is not always reversible, since unstack still requires a pandas multi-index (one and only one multi-index per dimension to unstack).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
925701666 https://github.com/pydata/xarray/pull/5692#issuecomment-925701666 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X843LRYi benbovy 4160723 2021-09-23T10:49:44Z 2021-09-23T10:49:44Z MEMBER

With the new renaming behavior that doesn't add or drop any index, It seems pretty safe to me to allow renaming a dimension even when the new name corresponds to an existing variable (but not an existing dimension). FYI, this was forbidden in #3645. @keewis any thoughts on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
919836550 https://github.com/pydata/xarray/pull/5692#issuecomment-919836550 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84205eG benbovy 4160723 2021-09-15T09:04:02Z 2021-09-15T09:04:02Z MEMBER

Also from discussion with @shoyer:

It would make sense (in a follow-up PR) to depreciate the current set_index API (i.e., set new indexes per dimension) in favor of setting only one index with:

  • one or more variable names
  • optionally the index type (by default a pandas index or multi-index is created)
  • optionally one or more index build options to pass to the custom index (pandas indexes have no build option).

This departs from pandas.DataFrame.set_index API but it's more adapted to xarray's new data model with explicit indexes.

Currently Xarray also allows creating (multi-)indexes from existing (multi-)indexes, either with .set_index({...}, append=True), .reset_index(["not", "all", "levels"]) and .reorder_levels(). This API is specific to pandas (multi-)indexes so maybe it would make sense to depreciate it? One advantage over simply drop the index and re-create it from scratch is a possible gain in performance, which probably makes sense in pandas but we're not sure if it's worth the extra complexity here. Unless this kind incremental index build would be useful for other, custom indexes too?

(we should probably discuss this further in a new issue)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
919822643 https://github.com/pydata/xarray/pull/5692#issuecomment-919822643 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X84202Ez benbovy 4160723 2021-09-15T08:45:00Z 2021-09-15T08:45:00Z MEMBER

This PR introduces some minor changes in behavior (no API change), mostly related to somewhat tricky workarounds to the limitations of the “index/dimension” coordinates concept that we no longer need with explicit indexes.

I'll detail them below. A couple of those changes are bug fixes so we can safely make them now. For the other changes I'm not sure how best to proceed. Keeping the current behavior may in some cases require additional implementation effort that I'm not sure it's worth doing if no one relies on this weird behavior. Any thoughts @pydata/xarray?


  1. .rename_*

1.1 Indexes are now preserved when renaming dimensions or coordinates

```python ds = xr.Dataset({"x": ("x", [0, 1, 2])}) renamed = ds.rename_dims({"x": "x_new"})

Before

"x" in renamed.indexes # False

Now

"x" in renamed.indexes # True ```


  1. set_index

2.1 Coordinate(s) dtype is preserved when setting new (multi-)indexes

```python da = xr.DataArray( [0, 1, 3, 4], dims="x", coords={"x": [0, 1, 0, 1], "y": ("x", ["a", "b", "a", "b"])} ) print(da)

<xarray.DataArray (x: 4)>

array([0, 1, 3, 4])

Coordinates:

* x (x) int64 0 1 0 1

y (x) <U1 'a' 'b' 'a' 'b'

python indexed = da.set_index(xy=["x", "y"])

Before

indexed.y.dtype # dtype('O')

Now

indexed.y.dtype # dtype('<U1') python indexed = da.set_index(x="y")

Before

indexed.y.dtype # dtype('O')

Now

indexed.y.dtype # dtype('<U1') ```

2.2 Setting a new single index for a new dimension raises an error

New dimension names allowed in set_index is useful to create new multi-indexes from scratch and avoid dimension name conflicts for its levels. However, new dimension names is also allowed for single indexes and resulted in this case to weird dimension renaming, now it raises an error.

```python

Before (bug)

da.set_index(y="x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

* y (y) int64 0 1 0 1

Dimensions without coordinates: x

Now

da.set_index(y="x")

ValueError: try setting an index for dimension 'y'

with variable 'x' that has dimensions ('x',)

```


  1. reset_index

3.1 Coordinate (level) names are preserved when (partially) resetting a multi-index

```python indexed = da.set_index(xy=["x", "y"]) indexed

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

* xy (xy) object MultiIndex

* x (xy) int64 0 1 0 1

* y (xy) <U1 'a' 'a' 'b' 'b'

```

```python

Before

indexed.reset_index("x")

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

* xy (xy) object 'a' 'a' 'b' 'b'

x (xy) int64 0 1 0 1

Now

indexed.reset_index("x")

<xarray.DataArray (xy: 4)>

array([0, 1, 2, 3])

Coordinates:

xy (xy) object MultiIndex

x (xy) int64 0 1 0 1

* y (xy) <U1 'a' 'a' 'b' 'b'

```

(note: the xy coordinate in the latter result has MultiIndex but is actually not part anymore of any xarray index in the returned DataArray. We'll need to remove the MultiIndex inline repr in favor of the tuple (level values) coordinate, perhaps when we'll add an Indexes section to the xarray objects repr. Eventually we could also get rid of this tuple coordinate and keep the coordinates of the multi-index levels only)

3.2 Single index coordinates that are reset (but kept) are not renamed

```python

Before

da.reset_index("x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

y (x) <U1 'a' 'a' 'b' 'b'

x_ (x) int64 0 1 0 1

Dimensions without coordinates: x

Now

da.reset_index("x")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

x (x) int64 0 1 0 1

y (x) <U1 'a' 'a' 'b' 'b'

```

3.3 Fix bug when trying to reset a non-dimension (and non-level) coordinate

```python

Before

da.reset_index("y")

<xarray.DataArray (x: 4)>

array([0, 1, 2, 3])

Coordinates:

* x (x) int64 0 1 0 1

y_ (y) object 'a' 'a' 'b' 'b'

Now

da.reset_index("y")

ValueError: ('y',) are not coordinates with an index

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
914207743 https://github.com/pydata/xarray/pull/5692#issuecomment-914207743 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842fbP_ benbovy 4160723 2021-09-07T11:01:40Z 2021-09-07T15:53:17Z MEMBER

I'm currently adding more type annotations to the Index (sub)classes. I'm not 100% sure what to do for the query method, though. Without type annotations, I was thinking about those signatures:

```python class Index: def query(self, labels, **kwargs): raise NotImplementedError()

class PandasIndex(Index): def query(self, labels, method=None, tolerance=None): ...

class CustomIndex(Index): def query(self, labels, option1="value", option2=5): ... ```

However, this is unsafe and mypy complains (https://github.com/python/mypy/issues/7424).

Alternative options that work with type annotations:

  1. add # type: ignore[override] in Index subclasses
  2. add **kwargs in subclasses too (those are ignored)
  3. use a fixed number of arguments, i.e., Index.query(labels, query_options: Optional[Dict[str, Any]] = None)

Maybe other alternatives?

I don't like 1 and 2, it does not look very nice. 3 makes default query option values less clear, unless we define them elsewhere as a convention (e.g., as a class property)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
914419518 https://github.com/pydata/xarray/pull/5692#issuecomment-914419518 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842gO8- shoyer 1217238 2021-09-07T15:44:46Z 2021-09-07T15:45:03Z MEMBER

My suggestion:

Don't include keyword arguments inside the base class: python class Index: def query(self, labels): raise NotImplementedError()

Still include custom keyword arguments on subclass query() methods.

Use # type: ignore inside xarray, when we call query()

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
909263472 https://github.com/pydata/xarray/pull/5692#issuecomment-909263472 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842MkJw benbovy 4160723 2021-08-31T13:57:44Z 2021-08-31T13:57:56Z MEMBER

I did run pre-commit clean but the errors are still there in my local repository. It's weird that all pre-commit hooks pass on CI.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
909212646 https://github.com/pydata/xarray/pull/5692#issuecomment-909212646 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842MXvm benbovy 4160723 2021-08-31T12:58:36Z 2021-08-31T12:58:36Z MEMBER

@max-sixty I'm trying to merge the main branch here but I'm struggling with a lot of mypy errors like

xarray/core/_typed_ops.pyi:179: error: Self argument missing for a non-static method (or an invalid type for self) [misc]

Do you know how to fix this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
907281385 https://github.com/pydata/xarray/pull/5692#issuecomment-907281385 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842FAPp shoyer 1217238 2021-08-27T15:21:16Z 2021-08-27T15:21:16Z MEMBER

+1 for a class!

On Fri, Aug 27, 2021 at 2:33 AM Benoit Bovy @.***> wrote:

With the latest commits, the return type of Index.query() starts to look ugly:

Tuple[Mapping[str, Any], Optional[IndexWithVars]] where IndexWithVars = Tuple["Index", Optional[IndexVars]]

We need to return:

  • positional indexers in the form {"dim_1": [...], "dim_2": [...]}
  • optionally a new Index object
  • optionally new index variable objects
  • perhaps we should also explicitly return the dimensions to rename (e.g., when only one level of the multi-index remains) instead of inferring it later from the returned index variable(s)?

I think anIndexQueryResults class would be better to hold all this information in a cleaner way. What do you think @shoyer https://github.com/shoyer?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/5692#issuecomment-907066176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVXV7E2SLUNQMEXVJQTT65LVHANCNFSM5B63RXLQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
907066176 https://github.com/pydata/xarray/pull/5692#issuecomment-907066176 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X842ELtA benbovy 4160723 2021-08-27T09:32:57Z 2021-08-27T09:32:57Z MEMBER

With the latest commits, the return type of Index.query() starts to look ugly:

Tuple[Mapping[str, Any], Optional[IndexWithVars]] where IndexWithVars = Tuple["Index", Optional[IndexVars]]

We need to return:

  • positional indexers in the form {"dim_1": [...], "dim_2": [...]}
  • optionally a new Index object
  • optionally new index variable objects
  • perhaps we should also explicitly return the dimensions to rename (e.g., when only one level of the multi-index remains) instead of inferring it later from the returned index variable(s)?

I think anIndexQueryResults class would be better to hold all this information in a cleaner way. What do you think @shoyer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801
903899345 https://github.com/pydata/xarray/pull/5692#issuecomment-903899345 https://api.github.com/repos/pydata/xarray/issues/5692 IC_kwDOAMm_X8414GjR benbovy 4160723 2021-08-23T15:56:04Z 2021-08-23T15:56:04Z MEMBER

At this point, I'm wondering whether we should depreciate a couple of things now or later:

  • Passing a multi-index directly as a coordinate in Dataset / DataArray constructors

Currently, level coordinates are implicitly created from the multi-index. Suggested behavior: treat the index as an array-like (single coordinate) by default and encourage passing it more explicitly, e.g., something like

python midx = pd.MultiIndex.from_arrays([[...], [...]], names=("foo", "bar")) coords, index = xr.PandasMultiIndex.from_pandas_index(midx, dim=“x”) ds = xr.Dataset(coords=coords, indexes={("foo", "bar"): index})

  • IndexVariable.level_names and IndexVariable.get_level_variable

  • Maybe #5732 too?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes 966983801

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 343.102ms · About: xarray-datasette