home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "MEMBER" and issue = 187608079 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 7
  • dcherian 1

issue 1

  • Is there a more efficient way to convert a subset of variables to a dataframe? · 8 ✖

author_association 1

  • MEMBER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
661919828 https://github.com/pydata/xarray/issues/1086#issuecomment-661919828 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDY2MTkxOTgyOA== dcherian 2448579 2020-07-21T15:10:02Z 2020-07-21T15:10:02Z MEMBER

can you make a reproducible example @andreall?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259044805 https://github.com/pydata/xarray/issues/1086#issuecomment-259044805 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTA0NDgwNQ== shoyer 1217238 2016-11-08T04:46:23Z 2016-11-08T04:46:23Z MEMBER

So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it.

Maybe? I'm not confident enough to advise you to go to that trouble.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259035428 https://github.com/pydata/xarray/issues/1086#issuecomment-259035428 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAzNTQyOA== shoyer 1217238 2016-11-08T03:25:58Z 2016-11-08T03:25:58Z MEMBER

Under the covers open_mfdataset just uses open_dataset and merge/concat. So this would be similar either way. On Mon, Nov 7, 2016 at 7:14 PM naught101 notifications@github.com wrote:

Yeah, I'm loading each file separately with xr.open_dataset(), since it's not really a multi-file dataset (it's a lot of single-site datasets, some of which have different variables, and overlapping time dimensions). I don't think I can avoid loading them separately...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1086#issuecomment-259033970, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1oUWnGIBO3mX5h56mgPvCbCU7PI3ks5q7-krgaJpZM4Kqw2_ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259028693 https://github.com/pydata/xarray/issues/1086#issuecomment-259028693 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAyODY5Mw== shoyer 1217238 2016-11-08T02:36:16Z 2016-11-08T02:36:16Z MEMBER

One thing that might hurt is that xarray (lazily) decodes times from each file separately, rather than decoding times all at one. But this hasn't been much of an issue before even with hundreds of times, so I'm not sure what's going on here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258884141 https://github.com/pydata/xarray/issues/1086#issuecomment-258884141 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODg4NDE0MQ== shoyer 1217238 2016-11-07T16:27:21Z 2016-11-07T16:27:21Z MEMBER

can you give me a copy/pastable script that has the slowness issue with that file?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258755912 https://github.com/pydata/xarray/issues/1086#issuecomment-258755912 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1NTkxMg== shoyer 1217238 2016-11-07T06:20:18Z 2016-11-07T06:20:18Z MEMBER

How did you construct this dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258754037 https://github.com/pydata/xarray/issues/1086#issuecomment-258754037 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1NDAzNw== shoyer 1217238 2016-11-07T06:02:56Z 2016-11-07T06:02:56Z MEMBER

Try calling .load() before .to_dataframe

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258748969 https://github.com/pydata/xarray/issues/1086#issuecomment-258748969 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc0ODk2OQ== shoyer 1217238 2016-11-07T05:14:11Z 2016-11-07T05:14:24Z MEMBER

The simplest thing to try is making use of .squeeze(), e.g., dataset[data_vars].squeeze().to_dataframe(). Does that have any better performance? At least it's a bit less typing.

I'm not sure why pandas.tslib.array_to_timedelta64 is slow here, or even how it is being called in your example. I would need a complete example that I can run to debug that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 35.167ms · About: xarray-datasette