home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE", issue = 187608079 and user = 167164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • naught101 · 7 ✖

issue 1

  • Is there a more efficient way to convert a subset of variables to a dataframe? · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
259044958 https://github.com/pydata/xarray/issues/1086#issuecomment-259044958 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTA0NDk1OA== naught101 167164 2016-11-08T04:47:56Z 2016-11-08T04:47:56Z NONE

Ok, no worries. I'll try it if it gets desperate :)

Thanks for your help, shoyer!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259041491 https://github.com/pydata/xarray/issues/1086#issuecomment-259041491 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTA0MTQ5MQ== naught101 167164 2016-11-08T04:16:26Z 2016-11-08T04:16:26Z NONE

So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259033970 https://github.com/pydata/xarray/issues/1086#issuecomment-259033970 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAzMzk3MA== naught101 167164 2016-11-08T03:14:50Z 2016-11-08T03:14:50Z NONE

Yeah, I'm loading each file separately with xr.open_dataset(), since it's not really a multi-file dataset (it's a lot of single-site datasets, some of which have different variables, and overlapping time dimensions). I don't think I can avoid loading them separately...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259026069 https://github.com/pydata/xarray/issues/1086#issuecomment-259026069 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAyNjA2OQ== naught101 167164 2016-11-08T02:19:01Z 2016-11-08T02:19:01Z NONE

Not easily - most scripts require multiple (up to 200, of which the linked one is one of the smallest, some are up to 10Mb) of these datasets in a specific directory structure, and rely on a couple of private python modules. I was just asking because I thought I might have been missing something obvious, but now I guess that isn't the case. Probably not worth spending too much time on this - if it starts becoming a real problem for me, I will try to generate something self-contained that shows the problem. Until then, maybe it's best to assume that xarray/pandas are doing the best they can given the requirements, and close this for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258774196 https://github.com/pydata/xarray/issues/1086#issuecomment-258774196 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc3NDE5Ng== naught101 167164 2016-11-07T08:30:25Z 2016-11-07T08:30:25Z NONE

I loaded it from a netcdf file. There's an example you can play with at https://dl.dropboxusercontent.com/u/50684199/MitraEFluxnet.1.4_flux.nc

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258755061 https://github.com/pydata/xarray/issues/1086#issuecomment-258755061 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1NTA2MQ== naught101 167164 2016-11-07T06:12:27Z 2016-11-07T06:12:27Z NONE

Slightly slower (using %timeit in ipython)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258753366 https://github.com/pydata/xarray/issues/1086#issuecomment-258753366 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1MzM2Ng== naught101 167164 2016-11-07T05:56:26Z 2016-11-07T05:56:26Z NONE

Squeeze is pretty much identical in efficiency. Seems very slightly better (2-5%) on smaller datasets. (I still need to add the final [data_vars] to get rid of the extraneous index_var columns, but that doesn't affect performance much).

I'm not calling pandas.tslib.array_to_timedelta64, to_dataframe is - the caller list is (sorry, I'm not sure of a better way to show this):

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.72ms · About: xarray-datasette