home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 517799069 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • dcherian 1
  • mullenkamp 1

author_association 2

  • MEMBER 1
  • NONE 1

issue 1

  • Should performance be equivalent when opening with chunks or re-chunking a dataset? · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
909342151 https://github.com/pydata/xarray/issues/3486#issuecomment-909342151 https://api.github.com/repos/pydata/xarray/issues/3486 IC_kwDOAMm_X842M3XH dcherian 2448579 2021-08-31T15:27:28Z 2021-08-31T15:27:57Z MEMBER

What happens is that dask first constructs chunks of size specified in open_mfdataset and then breaks those up to new chunk sizes specified in the .chunk() call.

A similar behaviour is present for repeated chunk calls .chunk().chunk(), these do not get optimized to a single chunk call yet.

So yes, you should pass appropriate chunk sizes in open_mfdataset

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should performance be equivalent when opening with chunks or re-chunking a dataset?  517799069
906023525 https://github.com/pydata/xarray/issues/3486#issuecomment-906023525 https://api.github.com/repos/pydata/xarray/issues/3486 IC_kwDOAMm_X842ANJl mullenkamp 2656596 2021-08-26T02:19:50Z 2021-08-26T02:19:50Z NONE

This seems to be an ongoing problem (Unexpected behaviour when chunking with multiple netcdf files in xarray/dask, Performance of chunking in xarray / dask when opening and re-chunking a dataset) that has not been resolved nor has feedback been provided.

I've been running into this problem trying to handle netcdfs that are larger than my RAM. From my testing, chunks must be passed with open_mfdataset to be of any use. The chunks method on the datatset after opening seems to do nothing in this use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should performance be equivalent when opening with chunks or re-chunking a dataset?  517799069

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.875ms · About: xarray-datasette