home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER", issue = 345715825 and user = 10050469 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • fmaussion · 5 ✖

issue 1

  • Out-of-core processing with dask not working properly? · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
409238042 https://github.com/pydata/xarray/issues/2329#issuecomment-409238042 https://api.github.com/repos/pydata/xarray/issues/2329 MDEyOklzc3VlQ29tbWVudDQwOTIzODA0Mg== fmaussion 10050469 2018-07-31T14:20:06Z 2018-07-31T14:20:06Z MEMBER

I updated my example above to show that the chunking over the last dimension is ridiculously slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Out-of-core processing with dask not working properly? 345715825
409172635 https://github.com/pydata/xarray/issues/2329#issuecomment-409172635 https://api.github.com/repos/pydata/xarray/issues/2329 MDEyOklzc3VlQ29tbWVudDQwOTE3MjYzNQ== fmaussion 10050469 2018-07-31T10:25:16Z 2018-07-31T14:18:29Z MEMBER

Sorry for the confusion, I had an obvious mistake in my timing experiment above (forgot to do the actual computations...).

The dimension order does make a difference:

```python import dask as da import xarray as xr

d = xr.DataArray(da.array.zeros((1000, 721, 1440), chunks=(10, 721, 1440)), dims=('z', 'y', 'x')) d.to_netcdf('da.nc') # 8.3 Gb

with xr.open_dataarray('da.nc', chunks={'z':10}) as d: %timeit d.sum().load() 3.94 s ± 95.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

with xr.open_dataarray('da.nc', chunks={'y':10}) as d: %timeit d.sum().load() 4.15 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

with xr.open_dataarray('da.nc', chunks={'x':10}) as d: %timeit d.sum().load() 1min 54s ± 1.43 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

with xr.open_dataarray('da.nc', chunks={'y':10, 'x':10}) as d: %timeit d.sum().load() 2min 23s ± 215 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Out-of-core processing with dask not working properly? 345715825
409168605 https://github.com/pydata/xarray/issues/2329#issuecomment-409168605 https://api.github.com/repos/pydata/xarray/issues/2329 MDEyOklzc3VlQ29tbWVudDQwOTE2ODYwNQ== fmaussion 10050469 2018-07-31T10:09:36Z 2018-07-31T13:21:34Z MEMBER

Those chunksizes are the opposite of what I was expecting...

chunksizes in encoding are ignored in your case, dask still uses your user provided encoding.

Can you still try to chunk along one dimension only? i.e. chunks={'time':200}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Out-of-core processing with dask not working properly? 345715825
409165114 https://github.com/pydata/xarray/issues/2329#issuecomment-409165114 https://api.github.com/repos/pydata/xarray/issues/2329 MDEyOklzc3VlQ29tbWVudDQwOTE2NTExNA== fmaussion 10050469 2018-07-31T09:56:54Z 2018-07-31T10:20:32Z MEMBER

[EDIT]: forgot the load ...

<s> forget my comment about chunks - I thought this would make a difference but it's actually the opposite (to my surprise): </s>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Out-of-core processing with dask not working properly? 345715825
409159969 https://github.com/pydata/xarray/issues/2329#issuecomment-409159969 https://api.github.com/repos/pydata/xarray/issues/2329 MDEyOklzc3VlQ29tbWVudDQwOTE1OTk2OQ== fmaussion 10050469 2018-07-31T09:38:37Z 2018-07-31T10:19:37Z MEMBER

Out of curiosity: - why do you chunk over lats and lons rather than time? The order of dimensions in your dataarray suggest that chunking over time could be more efficient - can you show the output of ds.mtpr and ds.mtpr.encoding ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Out-of-core processing with dask not working properly? 345715825

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 54.245ms · About: xarray-datasette