home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 238284894 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • Writing directly to a netCDF file while using distributed · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
341329662 https://github.com/pydata/xarray/issues/1464#issuecomment-341329662 https://api.github.com/repos/pydata/xarray/issues/1464 MDEyOklzc3VlQ29tbWVudDM0MTMyOTY2Mg== shoyer 1217238 2017-11-02T06:29:38Z 2017-11-02T06:29:38Z MEMBER

I did a little bit of digging here, using @mrocklin's Client(processes=False) trick.

The problem seems to be that the arrays that we add to the writer in AbstractWritableDataStore.set_variables are not pickleable. To be more concrete, consider these lines: https://github.com/pydata/xarray/blob/f83361c76b6aa8cdba8923080bb6b98560cf3a96/xarray/backends/common.py#L221-L232

target is currently a netCDF4.Variable object (or whatever the appropriate backend type is). Anything added to the writer eventually ends up as an argument to dask.array.store and hence gets put into the dask graph. When dask-distributed tries to pickle the dask graph, it fails on the netCDF4.Variable.

What we need to instead is wrap these target arrays in appropriate array wrappers, e.g., NetCDF4ArrayWrapper, adding __setitem__ methods to the array wrappers if needed. Unlike most backend array types, our array wrappers are pickleable, which is essentially for use with dask-distributed.

If anyone's curious, here's the traceback and code I used to debug this: https://gist.github.com/shoyer/4564971a4d030cd43bba8241d3b36c73

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing directly to a netCDF file while using distributed 238284894
311122109 https://github.com/pydata/xarray/issues/1464#issuecomment-311122109 https://api.github.com/repos/pydata/xarray/issues/1464 MDEyOklzc3VlQ29tbWVudDMxMTEyMjEwOQ== shoyer 1217238 2017-06-26T17:10:07Z 2017-06-26T17:10:07Z MEMBER

I'm a little surprised that this doesn't work because I thought we made all our xarray datastore object pickle-able.

The place to start is probably to write an integration test for this functionality. I notice now that our current tests only check reading netCDF files with dask-distributed: https://github.com/pydata/xarray/blob/master/xarray/tests/test_distributed.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing directly to a netCDF file while using distributed 238284894
310817117 https://github.com/pydata/xarray/issues/1464#issuecomment-310817117 https://api.github.com/repos/pydata/xarray/issues/1464 MDEyOklzc3VlQ29tbWVudDMxMDgxNzExNw== shoyer 1217238 2017-06-24T06:05:09Z 2017-06-24T06:05:09Z MEMBER

Hmm. Can you try using scipy as an engine to write the netcdf file?

Honestly I've barely used dask distributed. Possibly @mrocklin has ideas.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing directly to a netCDF file while using distributed 238284894

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 222.426ms · About: xarray-datasette