home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 99026442 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • jhamman 2
  • wesleybowman 2
  • aidanheerdegen 1

author_association 3

  • MEMBER 2
  • NONE 2
  • CONTRIBUTOR 1

issue 1

  • Wall time much greater than CPU time · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
269566649 https://github.com/pydata/xarray/issues/516#issuecomment-269566649 https://api.github.com/repos/pydata/xarray/issues/516 MDEyOklzc3VlQ29tbWVudDI2OTU2NjY0OQ== jhamman 2443309 2016-12-29T01:09:54Z 2016-12-29T01:09:54Z MEMBER

@wesleybowman - were you able to work though this issue? If not, feel free to reopen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Wall time much greater than CPU time 99026442
135510417 https://github.com/pydata/xarray/issues/516#issuecomment-135510417 https://api.github.com/repos/pydata/xarray/issues/516 MDEyOklzc3VlQ29tbWVudDEzNTUxMDQxNw== wesleybowman 3688009 2015-08-27T18:11:43Z 2015-08-27T18:11:43Z NONE

using ncdump -hs, I found the chunk sizes of any of the files to be: _ChunkSizes = 1, 90, 180 ;

Using that, it took even more time:

``` datal = xray.open_mfdataset(filename, chunks={'time':1, 'lat':90, 'lon':180})

In [7]: %time datal.tasmax[:, 360, 720].values CPU times: user 3min 3s, sys: 59.4 s, total: 4min 3s Wall time: 12min 8s ```

I should say that I am using open source data, and therefore do not control how the original data is being chunked. This is also using open_mfdataset on around 100 files

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Wall time much greater than CPU time 99026442
133992153 https://github.com/pydata/xarray/issues/516#issuecomment-133992153 https://api.github.com/repos/pydata/xarray/issues/516 MDEyOklzc3VlQ29tbWVudDEzMzk5MjE1Mw== aidanheerdegen 6063709 2015-08-24T02:21:43Z 2015-08-24T02:21:43Z CONTRIBUTOR

What is the netCDF4 chunking scheme for your compressed data? (use 'ncdump -hs' to reveal the per variable chunking scheme).

Very large datasets can have very long load times depending on the access pattern.

This can be overcome with an appropriately chosen chunking scheme, but if the chunk sizes are not well chosen (and the default library chunking is pretty terrible) then certain access patterns might still be very slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Wall time much greater than CPU time 99026442
129995032 https://github.com/pydata/xarray/issues/516#issuecomment-129995032 https://api.github.com/repos/pydata/xarray/issues/516 MDEyOklzc3VlQ29tbWVudDEyOTk5NTAzMg== wesleybowman 3688009 2015-08-11T18:01:04Z 2015-08-11T18:01:04Z NONE

Hmm. I moved the uncompressed files to my local hard drive, and I am still getting a lot more wall time than CPU time. 31 seconds would be more than acceptable, but 8 minutes is really pushing it.

%time datal.tasmax[:, 360, 720].values CPU times: user 25.2 s, sys: 5.83 s, total: 31 s Wall time: 8min 1s

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Wall time much greater than CPU time 99026442
128450910 https://github.com/pydata/xarray/issues/516#issuecomment-128450910 https://api.github.com/repos/pydata/xarray/issues/516 MDEyOklzc3VlQ29tbWVudDEyODQ1MDkxMA== jhamman 2443309 2015-08-06T17:26:52Z 2015-08-06T17:26:52Z MEMBER

My take on this is that you are running into I/O barrier on your external hard drive. Reading from netCDF, even when compressed, is almost always I/O bound.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Wall time much greater than CPU time 99026442

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 385.114ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows