home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where user = 15720911 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Li9htmare · 5 ✖

issue 1

  • to_xarray() result is incorrect when one of multi-index levels is not sorted 5

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
652064154 https://github.com/pydata/xarray/issues/4186#issuecomment-652064154 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MjA2NDE1NA== Li9htmare 15720911 2020-06-30T21:48:33Z 2020-06-30T21:48:33Z NONE

This intention of variables used constructing the Dataset looks a lot clearer now. Many thanks Stephan!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651984472 https://github.com/pydata/xarray/issues/4186#issuecomment-651984472 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTk4NDQ3Mg== Li9htmare 15720911 2020-06-30T19:02:28Z 2020-06-30T19:02:28Z NONE

Sorry @shoyer, I didn't notice you have pushed new commits to #4184 and thought you meant to just remove the DataFrame.set_index. Your latest commits indeed give the correct result. My concern was when another person works on this and didn't get the context that idx might be different from dataframe.index and new bugs could potentially be introduced. Though consider the limited scope where we are maintaining both idx and dataframe, I guess it should be fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651674763 https://github.com/pydata/xarray/issues/4186#issuecomment-651674763 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTY3NDc2Mw== Li9htmare 15720911 2020-06-30T09:24:13Z 2020-06-30T09:24:13Z NONE

Hi @shoyer , without dataframe.set_index(), dataframe.index can potentially be different from idx returned by remove_unused_levels_categories, this will lead to other problems. One example is the following df: df = pd.DataFrame( { 'lev1': pd.Series( ['b', 'a'], dtype=pd.CategoricalDtype(['c', 'b', 'a'], ordered=True) ), 'lev2': 'foo', 'C1': [0, 2], 'C2': [1, 3], } ).set_index(['lev1', 'lev2'])

I agree it will be better if we can maintain the order from df to xr.Dataset, but I think we should never work with a copy of idx which is different from dataframe.index, as this will lead to hard to debug problems due to "surprising" behavior pandas does.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
651424721 https://github.com/pydata/xarray/issues/4186#issuecomment-651424721 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MTQyNDcyMQ== Li9htmare 15720911 2020-06-29T23:40:41Z 2020-06-29T23:41:45Z NONE

Hi @shoyer, sorry I got you confused, I should have run your code at first place. You code removes the problematic dataframe.reindex in Dataset._set_numpy_data_from_dataframe, but there is indeed another place causing the problem, which is actually already fixed (but not released yet) by https://github.com/pydata/xarray/pull/3953/files#diff-921db548d18a549f6381818ed08298c9L4607-L4608

Using pzhlobi's example df with xarray 0.15.1 (incorrect result): df.to_xarray() <xarray.Dataset> Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'b' 'a' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 2 0 C2 (lev1, lev2) int64 3 1

Using the same df with both #3953 and #4184 (correct result): df.to_xarray() <xarray.Dataset> Dimensions: (lev1: 2, lev2: 1) Coordinates: * lev1 (lev1) object 'a' 'b' * lev2 (lev2) object 'foo' Data variables: C1 (lev1, lev2) int64 2 0 C2 (lev1, lev2) int64 3 1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560
650738680 https://github.com/pydata/xarray/issues/4186#issuecomment-650738680 https://api.github.com/repos/pydata/xarray/issues/4186 MDEyOklzc3VlQ29tbWVudDY1MDczODY4MA== Li9htmare 15720911 2020-06-28T11:37:20Z 2020-06-28T11:37:20Z NONE

It seems the problem here is in Dataset.from_dataframe the dims and coords are created with df.index.levels which is unsorted: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4642-L4643

Then in Dataset._set_numpy_data_from_dataframe, the pd.MultiIndex.from_product and dataframe.reindex unintentionally sort the dataframe by index: https://github.com/pydata/xarray/blob/732750a06aef2025b206ba6ff765f5acc53bfa25/xarray/core/dataset.py#L4588-L4589

Besides the perf improvement it provides, #4184 seems also have a nice side effect fixing this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_xarray() result is incorrect when one of multi-index levels is not sorted 646716560

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.756ms · About: xarray-datasette