home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "NONE" and issue = 142498006 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • kynan 5

issue 1

  • Integration with dask/distributed (xarray backend design) · 5 ✖

author_association 1

  • NONE · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
262214999 https://github.com/pydata/xarray/issues/798#issuecomment-262214999 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI2MjIxNDk5OQ== kynan 346079 2016-11-22T11:18:56Z 2016-11-22T11:18:56Z NONE

When using xarray with the dask.distributed scheduler it would be useful to be able to persist intermediate DataArrays / Datasets on remote workers.

There could be a persist method analogous to the compute method introduced in #1024. Potential issues with this approach are:

  1. What are the semantics of this operation for the general case where dask or distributed are not used?
  2. Is it justified to add an operation which is rather specific to the distributed scheduler?

(Could create a separate issue for this if preferred).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
259277067 https://github.com/pydata/xarray/issues/798#issuecomment-259277067 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1OTI3NzA2Nw== kynan 346079 2016-11-08T22:17:14Z 2016-11-08T22:17:14Z NONE

Great to see this moving! I take it the workshop was productive?

How does #1095 work in the scenario of a distributed scheduler with remote workers? Do I understand correctly that all workers and the client would need to see the same shared filesystem from where NetCDF files are read?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
256038226 https://github.com/pydata/xarray/issues/798#issuecomment-256038226 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1NjAzODIyNg== kynan 346079 2016-10-25T13:43:32Z 2016-10-25T13:43:32Z NONE

For the case where NetCDF / HDF5 files are only available on the distributed workers and not directly accessible from the client, how would you get the necessary metadata (coords, dims etc.) to construct the xarray.Dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
255207705 https://github.com/pydata/xarray/issues/798#issuecomment-255207705 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1NTIwNzcwNQ== kynan 346079 2016-10-20T19:42:41Z 2016-10-20T19:42:41Z NONE

I'm probably not familiar enough with either the xarray or dask / distributed codebases to provide much input but would be happy to contribute if / where it makes sense. Would also be happy to be part of a some real-time discussion if feasible (based in the UK, so wouldn't be able to attend the workshop).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
255184991 https://github.com/pydata/xarray/issues/798#issuecomment-255184991 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1NTE4NDk5MQ== kynan 346079 2016-10-20T18:14:38Z 2016-10-20T18:14:38Z NONE

Has this issue progressed since?

Being able to distribute loading of files to a dask cluster and composing an xarray Dataset from data on remote workers would be a great feature.

Is @mrocklin's blog post from Feb 2016 still the reference for remote data loading on a cluster? Adapting it to loading xarray Datasets rather than plain arrays is not straightforward since there is no way to combine futures representing Datasets out of the box.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.534ms · About: xarray-datasette