home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 187872991 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 3
  • dcherian 1
  • jcrist 1
  • stale[bot] 1

author_association 2

  • MEMBER 4
  • NONE 2

issue 1

  • Convert xarray dataset to dask dataframe or delayed objects · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
857330304 https://github.com/pydata/xarray/issues/1093#issuecomment-857330304 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDg1NzMzMDMwNA== stale[bot] 26384082 2021-06-09T02:50:46Z 2021-06-09T02:50:46Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991
509773034 https://github.com/pydata/xarray/issues/1093#issuecomment-509773034 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDUwOTc3MzAzNA== dcherian 2448579 2019-07-09T19:20:51Z 2019-07-09T19:20:51Z MEMBER

I think this was closed by mistake. Is there a way to split up Dataset chunks into dask delayed objects where each object is a Dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991
259213382 https://github.com/pydata/xarray/issues/1093#issuecomment-259213382 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDI1OTIxMzM4Mg== shoyer 1217238 2016-11-08T18:09:11Z 2016-11-08T18:09:34Z MEMBER

The other component that would help for this is some utility function inside xarray to split a Dataset (or DataArray) into sub-datasets for each chunk. Something like:

python def split_by_chunks(dataset): chunk_slices = {} for dim, chunks in dataset.chunks.items(): slices = [] start = 0 for chunk in chunks: stop = start + chunk slices.append(slice(start, stop)) start = stop chunk_slices[dim] = slices for slices in itertools.product(*chunk_slices.values()): selection = dict(zip(chunk_slices.keys(), slices)) yield (selection, dataset[selection])

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991
259207151 https://github.com/pydata/xarray/issues/1093#issuecomment-259207151 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDI1OTIwNzE1MQ== shoyer 1217238 2016-11-08T17:46:23Z 2016-11-08T17:46:23Z MEMBER

Can you explain why you think this could benefit from collection duck typing?

Then we could use xarray's normal indexing operations to create a new sub-datasets, wrap them with dask.delayed and start chaining on delayed method calls like to_dataframe. The duck typing is necessary so that dask.delayed knows how to pull the dask graph out from the input Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991
259204793 https://github.com/pydata/xarray/issues/1093#issuecomment-259204793 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDI1OTIwNDc5Mw== jcrist 2783717 2016-11-08T17:37:25Z 2016-11-08T17:37:25Z NONE

I'm not sure if I follow how this is a duck typing use case. I'd write this as a method, following your suggestion on SO:

Toward this end, it would be nice if xarray had something like dask.array's to_delayed method for converting a Dataset into an array of delayed datasets, which you could then lazily convert into DataFrame objects and do your computation.

Can you explain why you think this could benefit from collection duck typing?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991
259052436 https://github.com/pydata/xarray/issues/1093#issuecomment-259052436 https://api.github.com/repos/pydata/xarray/issues/1093 MDEyOklzc3VlQ29tbWVudDI1OTA1MjQzNg== shoyer 1217238 2016-11-08T05:55:19Z 2016-11-08T05:55:19Z MEMBER

CC @mrocklin @jcrist

This is a good use case for dask collection duck typing: https://github.com/dask/dask/pull/1068

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert xarray dataset to dask dataframe or delayed objects 187872991

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3357.536ms · About: xarray-datasette