home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where issue = 180516114 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 7
  • byersiiasa 6
  • mrocklin 2
  • rabernat 2

author_association 2

  • MEMBER 11
  • NONE 6

issue 1

  • multidim groupby on dask arrays: dask.array.reshape error · 17 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
391805626 https://github.com/pydata/xarray/issues/1026#issuecomment-391805626 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDM5MTgwNTYyNg== shoyer 1217238 2018-05-24T17:59:31Z 2018-05-24T17:59:31Z MEMBER

Indeed, it looks like this works now. Extending the example from the first post: In [3]: ds.chunk({'x': 5}).thedata.groupby('thegroup').mean() Out[3]: <xarray.DataArray 'thedata' (thegroup: 2)> dask.array<shape=(2,), dtype=float64, chunksize=(1,)> Coordinates: * thegroup (thegroup) object False True

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
391738207 https://github.com/pydata/xarray/issues/1026#issuecomment-391738207 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDM5MTczODIwNw== rabernat 1197350 2018-05-24T14:36:29Z 2018-05-24T14:36:29Z MEMBER

We should check if this issue is resolved.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286856275 https://github.com/pydata/xarray/issues/1026#issuecomment-286856275 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4Njg1NjI3NQ== mrocklin 306380 2017-03-15T19:41:39Z 2017-03-15T19:41:39Z MEMBER

(along with now supporting many other reshape options)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286856207 https://github.com/pydata/xarray/issues/1026#issuecomment-286856207 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4Njg1NjIwNw== mrocklin 306380 2017-03-15T19:41:24Z 2017-03-15T19:41:24Z MEMBER

Fixed upstream, I think, in https://github.com/dask/dask/pull/2089

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286381505 https://github.com/pydata/xarray/issues/1026#issuecomment-286381505 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjM4MTUwNQ== byersiiasa 17701232 2017-03-14T10:30:24Z 2017-03-14T10:30:24Z NONE

Thanks - this is working well.

Reverting back to xarray 0.8.2 and dask 0.10.1 seems to be a combination that worked well for this particular task using delayed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286181363 https://github.com/pydata/xarray/issues/1026#issuecomment-286181363 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE4MTM2Mw== shoyer 1217238 2017-03-13T17:28:40Z 2017-03-13T17:28:40Z MEMBER

This is what I was looking for:

Frozen(SortedKeysDict({'allpoints': (1, 1, 1, 1, 1......(allpoints)....., 1, 1), 'T': (11L,)}))

So in this case (where the chunk size is already 1), dask.array.reshape could actually work fine and the error is unnecessary (we don't have the exploding task issue). So this could potentially be fixed upstream in dask.

For now, the best work-around (because you don't have any memory concerns) is to "rechunk" into a single block along the last axis before reshaping, e.g., .chunk(allpoints=259200) or .chunk(allpoints=1e9) (or something arbitrarily large).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286171415 https://github.com/pydata/xarray/issues/1026#issuecomment-286171415 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE3MTQxNQ== byersiiasa 17701232 2017-03-13T16:58:06Z 2017-03-13T16:58:06Z NONE

@shoyer No chunking as the dataset was quite small (360x720x30). Also, the calculation is along the time dimension so this effectively disappears for each lat/lon. Hence initial surprise why it was coming up with this chunk/reshape issue since I thought all it has to do is unstack 'allpoints'

If I print one of the dask arrays from within the function print sT dask.array<from-va..., shape=(11L,), dtype=float64, chunksize=(11L,)> This is 11L because the calculation returns 11 values per point to an xr.Dataset.

Others have no chunks because they are single values (for each point) print p_value dask.array<from-va..., shape=(), dtype=float64, chunksize=()> Only returns one value per point The object returned (xr.Dataset) from the .apply function comes out with chunks: mle.chunks Frozen(SortedKeysDict({'allpoints': (1, 1, 1, 1, 1......(allpoints)....., 1, 1), 'T': (11L,)}))

and looks like: <xarray.Dataset> Dimensions: (T: 11, allpoints: 259200) Coordinates: * T (T) int32 1 5 10 15 20 25 30 40 50 75 100 * allpoints (allpoints) MultiIndex - allpoints_level_0 (allpoints) float64 40.25 40.25 40.25 40.25 40.25 ... - allpoints_level_1 (allpoints) float64 22.75 23.25 23.75 24.25 24.75 ... Data variables: xi (allpoints) float64 -0.6906 -0.6906 -0.6906 -0.6906 ... mu (allpoints) float64 9.969e+36 9.969e+36 9.969e+36 ... sT (allpoints, T) float64 9.969e+36 9.969e+36 9.969e+36 ... KS_p_value (allpoints) float64 3.8e-12 3.8e-12 3.8e-12 3.8e-12 ... sigma (allpoints) float64 5.297e-24 5.297e-24 5.297e-24 ... KS_statistic (allpoints) float64 0.6321 0.6321 0.6321 0.6321 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286152988 https://github.com/pydata/xarray/issues/1026#issuecomment-286152988 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE1Mjk4OA== byersiiasa 17701232 2017-03-13T16:00:39Z 2017-03-13T16:00:39Z NONE

So, not sure if this is helpful but I'll leave these notes here just in case.

  • 0.11.0 - similar problem to @rabernat above - 0.10.1 - seems to work fine for what I wanted (delayed)
  • 0.9.0 - appeared to work ok, but actually I'm not convinced it was parallelising the tasks. And also resulted in massive memory issues
  • 0.14.0 - another problem, can't remember what but issue to do with delayed I think.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286152275 https://github.com/pydata/xarray/issues/1026#issuecomment-286152275 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE1MjI3NQ== shoyer 1217238 2017-03-13T15:58:29Z 2017-03-13T15:58:29Z MEMBER

@byersiiasa What matters for dask's reshape is the array shape and chunk shape, all of which you should see when you print a dask.array (or xarray.DataArray containing one). What is the size of the chunking along time and allpoints?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286144002 https://github.com/pydata/xarray/issues/1026#issuecomment-286144002 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE0NDAwMg== byersiiasa 17701232 2017-03-13T15:33:25Z 2017-03-13T15:33:25Z NONE

I have been re-running that script you helped me with in Google groups: https://groups.google.com/forum/#!searchin/xarray/combogev%7Csort:relevance/xarray/nfNh40Zt3sU/WfhavtXgCAAJ

do you mean the delayed object from within the function? perhaps <bound method Array.visualize of dask.array<from-va..., shape=(11L,), dtype=float64, chunksize=(11L,)>>

or perhaps Delayed('fit-3767d9ad6cfa517555b5800b3b5f4e41')

I am going to keep trying with different versions of dask since this 0.9.0 doesn't seem to behave it did previously.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286123584 https://github.com/pydata/xarray/issues/1026#issuecomment-286123584 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjEyMzU4NA== shoyer 1217238 2017-03-13T14:29:12Z 2017-03-13T14:29:12Z MEMBER

That array is loaded in numpy already - can you share the dask version? On Mon, Mar 13, 2017 at 2:57 AM byersiiasa notifications@github.com wrote:

<xarray.DataArray 'dis' (time: 30, allpoints: 259200)> array([[ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], ..., [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36]]) Coordinates: * time (time) datetime64[ns] 1971-01-01 1972-01-01 1973-01-01 ... * allpoints (allpoints) MultiIndex - lon (allpoints) float64 -179.8 -179.8 -179.8 -179.8 -179.8 -179.8 ... - lat (allpoints) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 ...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1026#issuecomment-286062113, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pd_uTiUQLRXjBhR7D06uvkkKJBDks5rlRLxgaJpZM4KMB0C .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286062113 https://github.com/pydata/xarray/issues/1026#issuecomment-286062113 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjA2MjExMw== byersiiasa 17701232 2017-03-13T09:57:04Z 2017-03-13T09:57:04Z NONE

<xarray.DataArray 'dis' (time: 30, allpoints: 259200)> array([[ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], ..., [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36]]) Coordinates: * time (time) datetime64[ns] 1971-01-01 1972-01-01 1973-01-01 ... * allpoints (allpoints) MultiIndex - lon (allpoints) float64 -179.8 -179.8 -179.8 -179.8 -179.8 -179.8 ... - lat (allpoints) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
285851059 https://github.com/pydata/xarray/issues/1026#issuecomment-285851059 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NTg1MTA1OQ== byersiiasa 17701232 2017-03-11T07:51:57Z 2017-03-12T14:53:35Z NONE

Hi @rabernat and @shoyer I have come across same issue while re-running some old code now using xarray 0.9.1 / dask 0.11.0. Was there any workaround or solution?

Issue occurs for me when trying to unstack 'allpoints', e.g. mle = stacked.dis.groupby('allpoints').apply(combogev) dsmle = mle.unstack('allpoints')

Thanks

Also works with dask 0.9.0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
285893380 https://github.com/pydata/xarray/issues/1026#issuecomment-285893380 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NTg5MzM4MA== shoyer 1217238 2017-03-11T19:23:55Z 2017-03-11T19:23:55Z MEMBER

@byersiiasa can you share what stacked.dis looks like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250997873 https://github.com/pydata/xarray/issues/1026#issuecomment-250997873 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk5Nzg3Mw== shoyer 1217238 2016-10-02T21:38:30Z 2016-10-02T21:38:30Z MEMBER

It would look something like this: 1. Verify that chunks are the same on all dask arrays to be stacked. 2. Use np.ravel with map_blocks to flatten each block independently. 3. Construct the appropriate (non-sorted) MultiIndex to label the flattened elements.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250995942 https://github.com/pydata/xarray/issues/1026#issuecomment-250995942 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk5NTk0Mg== rabernat 1197350 2016-10-02T21:04:43Z 2016-10-02T21:04:43Z MEMBER

We could work around this in xarray by adding custom logic to stack for keeping chunks together when reshaping

If you give me a few hints about how to approach this, I can try a PR. I need this rather urgently for an ongoing project.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
250986266 https://github.com/pydata/xarray/issues/1026#issuecomment-250986266 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI1MDk4NjI2Ng== shoyer 1217238 2016-10-02T18:20:36Z 2016-10-02T18:20:36Z MEMBER

This was an intentional change -- see https://github.com/dask/dask/pull/1469

Previously, we created lots of teeny tasks, which tended to negate any out of core benefits. The problem is that reshape promises an order to the elements it reshape which tends to split across existing chunks of dask arrays.

We could work around this in xarray by adding custom logic to stack for keeping chunks together when reshaping, but we can't do this upstream in dask because we need to make sure we keep all the arrays aligned.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.908ms · About: xarray-datasette