home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "MEMBER" and issue = 253407851 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • rabernat 1
  • jhamman 1

issue 1

  • to_dataframe (pandas) usage question · 2 ✖

author_association 1

  • MEMBER · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
327721325 https://github.com/pydata/xarray/issues/1534#issuecomment-327721325 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNzcyMTMyNQ== jhamman 2443309 2017-09-07T08:00:41Z 2017-09-07T08:00:41Z MEMBER

@mmartini-usgs - Thanks for the questions. I'm going to close this now as it seems like you're up and going. In the future, we try to keep our "Usage Questions" to the xarray users google group or StackOverflow. Cheers!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325447523 https://github.com/pydata/xarray/issues/1534#issuecomment-325447523 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTQ0NzUyMw== rabernat 1197350 2017-08-28T19:03:09Z 2017-08-28T19:03:09Z MEMBER

Marinna,

You are correct. In the present release of Xarray, converting to a pandas dataframe loads all of the data eagerly into memory as a regular pandas object, giving up dask's parallel capabilities and potentially consuming lots of memory. With chunked Xarray data, It would be preferable instead to convert to a dask.dataframe, rather than a regular pandas dataframe, which would carry over some of the performance benefits.

This is a known issue: https://github.com/pydata/xarray/issues/1462

With a solution in the works: https://github.com/pydata/xarray/pull/1489

So hopefully a release of Xarray in the near future will have the feature you seek.

Alternatively, if you describe the filtering, masking, and other QA/QC that you need to do in more detail, we may be able to help you accomplish this entirely within Xarray.

Good luck! Ryan

On Mon, Aug 28, 2017 at 2:02 PM, Marinna Martini notifications@github.com wrote:

Apologies for what is probably a very newbie question:

If I convert such a large file to pandas using to_dataframe() to gain access to more pandas methods, will I lose the speed and dask capabillity that is so wonderful in xarray?

I have a very large netCDF file (3 GB with 3 Million data points of 1-2 Hz ADCP data) that needs to be reduced to hourly or 10 min averages. xarray is perfect for this. I am exploring resample and other methods. It is amazingly fast doing this:

ds = xr.open_dataset('hugefile.nc') ds_lp = ds.resample('H','time','mean')

And an offset of about half a day is introduced to the data. Probably user error or due to filtering. To figure this out, I am looking at using resample in pandas directly, or multindexing and reshaping using methods that are not inherited from pandas by xarray, then back to xarray using to_xarray. I will also need to be masking data (and other things pandas can do) during a QA/QC process. It appears that pandas can do masking and xarray does not inherit masking?

Am I understanding the relationship between xarray and pandas correctly?

Thanks, Marinna

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1534, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJiIu3U-Y3o1jXE5FyqdYuzH2WrJGks5scwDRgaJpZM4PE25E .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 639.079ms · About: xarray-datasette