home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where author_association = "MEMBER" and issue = 180516114 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 7
  • mrocklin 2
  • rabernat 2

issue 1

  • multidim groupby on dask arrays: dask.array.reshape error · 11 ✖

author_association 1

  • MEMBER · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
391805626 https://github.com/pydata/xarray/issues/1026#issuecomment-391805626 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDM5MTgwNTYyNg== shoyer 1217238 2018-05-24T17:59:31Z 2018-05-24T17:59:31Z MEMBER

Indeed, it looks like this works now. Extending the example from the first post: In [3]: ds.chunk({'x': 5}).thedata.groupby('thegroup').mean() Out[3]: <xarray.DataArray 'thedata' (thegroup: 2)> dask.array<shape=(2,), dtype=float64, chunksize=(1,)> Coordinates: * thegroup (thegroup) object False True

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
391738207 https://github.com/pydata/xarray/issues/1026#issuecomment-391738207 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDM5MTczODIwNw== rabernat 1197350 2018-05-24T14:36:29Z 2018-05-24T14:36:29Z MEMBER

We should check if this issue is resolved.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286856275 https://github.com/pydata/xarray/issues/1026#issuecomment-286856275 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4Njg1NjI3NQ== mrocklin 306380 2017-03-15T19:41:39Z 2017-03-15T19:41:39Z MEMBER

(along with now supporting many other reshape options)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286856207 https://github.com/pydata/xarray/issues/1026#issuecomment-286856207 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4Njg1NjIwNw== mrocklin 306380 2017-03-15T19:41:24Z 2017-03-15T19:41:24Z MEMBER

Fixed upstream, I think, in https://github.com/dask/dask/pull/2089

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286181363 https://github.com/pydata/xarray/issues/1026#issuecomment-286181363 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE4MTM2Mw== shoyer 1217238 2017-03-13T17:28:40Z 2017-03-13T17:28:40Z MEMBER

This is what I was looking for:

Frozen(SortedKeysDict({'allpoints': (1, 1, 1, 1, 1......(allpoints)....., 1, 1), 'T': (11L,)}))

So in this case (where the chunk size is already 1), dask.array.reshape could actually work fine and the error is unnecessary (we don't have the exploding task issue). So this could potentially be fixed upstream in dask.

For now, the best work-around (because you don't have any memory concerns) is to "rechunk" into a single block along the last axis before reshaping, e.g., .chunk(allpoints=259200) or .chunk(allpoints=1e9) (or something arbitrarily large).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286152275 https://github.com/pydata/xarray/issues/1026#issuecomment-286152275 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE1MjI3NQ== shoyer 1217238 2017-03-13T15:58:29Z 2017-03-13T15:58:29Z MEMBER

@byersiiasa What matters for dask's reshape is the array shape and chunk shape, all of which you should see when you print a dask.array (or xarray.DataArray containing one). What is the size of the chunking along time and allpoints?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286123584 https://github.com/pydata/xarray/issues/1026#issuecomment-286123584 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjEyMzU4NA== shoyer 1217238 2017-03-13T14:29:12Z 2017-03-13T14:29:12Z MEMBER

That array is loaded in numpy already - can you share the dask version? On Mon, Mar 13, 2017 at 2:57 AM byersiiasa notifications@github.com wrote:

<xarray.DataArray 'dis' (time: 30, allpoints: 259200)> array([[ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], ..., [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36]]) Coordinates: * time (time) datetime64[ns] 1971-01-01 1972-01-01 1973-01-01 ... * allpoints (allpoints) MultiIndex - lon (allpoints) float64 -179.8 -179.8 -179.8 -179.8 -179.8 -179.8 ... - lat (allpoints) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 ...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1026#issuecomment-286062113, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pd_uTiUQLRXjBhR7D06uvkkKJBDks5rlRLxgaJpZM4KMB0C .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
285893380 https://github.com/pydata/xarray/issues/1026#issuecomment-285893380 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NTg5MzM4MA== shoyer 1217238 2017-03-11T19:23:55Z 2017-03-11T19:23:55Z MEMBER

@byersiiasa can you share what stacked.dis looks like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250997873 https://github.com/pydata/xarray/issues/1026#issuecomment-250997873 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk5Nzg3Mw== shoyer 1217238 2016-10-02T21:38:30Z 2016-10-02T21:38:30Z MEMBER

It would look something like this: 1. Verify that chunks are the same on all dask arrays to be stacked. 2. Use np.ravel with map_blocks to flatten each block independently. 3. Construct the appropriate (non-sorted) MultiIndex to label the flattened elements.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250995942 https://github.com/pydata/xarray/issues/1026#issuecomment-250995942 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk5NTk0Mg== rabernat 1197350 2016-10-02T21:04:43Z 2016-10-02T21:04:43Z MEMBER

We could work around this in xarray by adding custom logic to stack for keeping chunks together when reshaping

If you give me a few hints about how to approach this, I can try a PR. I need this rather urgently for an ongoing project.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250986266 https://github.com/pydata/xarray/issues/1026#issuecomment-250986266 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk4NjI2Ng== shoyer 1217238 2016-10-02T18:20:36Z 2016-10-02T18:20:36Z MEMBER

This was an intentional change -- see https://github.com/dask/dask/pull/1469

Previously, we created lots of teeny tasks, which tended to negate any out of core benefits. The problem is that reshape promises an order to the elements it reshape which tends to split across existing chunks of dask arrays.

We could work around this in xarray by adding custom logic to stack for keeping chunks together when reshaping, but we can't do this upstream in dask because we need to make sure we keep all the arrays aligned.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 722.068ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows