home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER" and issue = 646716560 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 8
  • fujiisoup 2

issue 1

  • to_xarray() result is incorrect when one of multi-index levels is not sorted · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
652032780 https://github.com/pydata/xarray/issues/4186#issuecomment-652032780 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MjAzMjc4MA== shoyer 1217238 2020-06-30T20:44:00Z 2020-06-30T20:44:00Z MEMBER

My concern was when another person works on this and didn't get the context that idx might be different from dataframe.index and new bugs could potentially be introduced

Let me see if I can rewrite the helper functions to avoid passing around a DataFrame

This was a good suggestion. Done in https://github.com/pydata/xarray/pull/4184/commits/96b544b5a59894359a35680151af71c0226f0505

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
652018527 https://github.com/pydata/xarray/issues/4186#issuecomment-652018527 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MjAxODUyNw== shoyer 1217238 2020-06-30T20:13:44Z 2020-06-30T20:13:44Z MEMBER

My concern was when another person works on this and didn't get the context that idx might be different from dataframe.index and new bugs could potentially be introduced

Let me see if I can rewrite the helper functions to avoid passing around a DataFrame

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651905098 https://github.com/pydata/xarray/issues/4186#issuecomment-651905098 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTkwNTA5OA== shoyer 1217238 2020-06-30T16:29:10Z 2020-06-30T16:44:02Z MEMBER

@Li9htmare I'm not sure I follow your example. #4184 does remove the use of DataFrame.set_index(), but it also removes any subsequent use of dataframe.index -- it always uses the separately processed index.

Is there something specific that you are worried about going wrong with your latest example? For what it's worth, here's what to_xarray() does with the current version of #4184: ``` In [4]: df.to_xarray() Out[4]: <xarray.Dataset> Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3

In [5]: df.to_xarray().indexes Out[5]: lev1: CategoricalIndex(['b', 'a'], categories=['b', 'a'], ordered=True, name='lev1', dtype='category') lev2: Index(['foo'], dtype='object', name='lev2') ```

I think this is doing the right thing already?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651467248 https://github.com/pydata/xarray/issues/4186#issuecomment-651467248 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQ2NzI0OA== shoyer 1217238 2020-06-30T01:41:36Z 2020-06-30T01:41:36Z MEMBER

The sorting seems to be a separate matter, caused by dataframe.set_index() inside our remove_unused_levels_categories function. I think we can remove that, which will fix the sorting issue when removing unused levels. Then the result will be the desired: df.to_xarray() <xarray.Dataset> Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651458105 https://github.com/pydata/xarray/issues/4186#issuecomment-651458105 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQ1ODEwNQ== shoyer 1217238 2020-06-30T01:14:45Z 2020-06-30T01:14:45Z MEMBER

Actually, I realize now that this is basically the same issue as https://github.com/pydata/xarray/issues/2619

If I remove the use of removed_unused_levels_categories from from_dataframe, then I get the same behavior that we considered a bug in that issue: In [5]: ds.isel(xy=ds['x'] < 4).to_pandas().to_xarray() Out[5]: <xarray.DataArray (x: 8, y: 5)> array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [10., 11., 12., 13., 14.], [15., 16., 17., 18., 19.], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 * y (y) int64 0 1 2 3 4

So maybe it is more consistent to keep calling remove_unused_levels(), which somewhat surprisingly sorts MultiIndex levels.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651454795 https://github.com/pydata/xarray/issues/4186#issuecomment-651454795 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQ1NDc5NQ== fujiisoup 6815844 2020-06-30T01:06:34Z 2020-06-30T01:06:34Z MEMBER

I agree that it's better not to sort.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651453863 https://github.com/pydata/xarray/issues/4186#issuecomment-651453863 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQ1Mzg2Mw== shoyer 1217238 2020-06-30T01:03:40Z 2020-06-30T01:03:40Z MEMBER

I verified that #4184 fixes the tests added for #3953 even after removing the call to remove_unused_levels_categories().

The main question is what behavior we want to do have: Should from_dataframe preserve index levels exactly, or should it sort them first?

I think it's better to not to sort (but of course it's better to sort than to get the wrong order).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651438776 https://github.com/pydata/xarray/issues/4186#issuecomment-651438776 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQzODc3Ng== fujiisoup 6815844 2020-06-30T00:21:43Z 2020-06-30T00:21:43Z MEMBER

I think the #3953 fixes the case where the multiindex has unused levels. I had no better idea than #3953, but if it works without #3953, it would be better ;)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651428394 https://github.com/pydata/xarray/issues/4186#issuecomment-651428394 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQyODM5NA== shoyer 1217238 2020-06-29T23:51:49Z 2020-06-29T23:51:49Z MEMBER

Thanks for clarifying!

This raises an interesting question for #4184: do we want to keep @fujiisoup's fix from #3953 or not?

If we remove @fujiisoup's fix, then the output we see is: df.to_xarray() <xarray.Dataset> Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 0 2 C2 (lev1, lev2) int64 1 3

This is also correct -- coordinates match up with values -- but the order of the result is different from what is currently on master.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651402838 https://github.com/pydata/xarray/issues/4186#issuecomment-651402838 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQwMjgzOA== shoyer 1217238 2020-06-29T22:28:00Z 2020-06-29T22:28:00Z MEMBER

Hi @pzhlobi @Li9htmare -- thanks for raising this issue.

Could you kindly clarify for me exactly what behavior you think xarray should do? The results are indeed reordered currently, but as far as I can tell the pairing between coordinators and values remains consistent.

When I test this myself, I see the same behavior (documented in the first post) either with or without my changes from #4184.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.268ms · About: xarray-datasette