issue_comments
9 rows where author_association = "MEMBER" and issue = 345715825 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Out-of-core processing with dask not working properly? · 9 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
415177535 | https://github.com/pydata/xarray/issues/2329#issuecomment-415177535 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQxNTE3NzUzNQ== | shoyer 1217238 | 2018-08-22T20:58:36Z | 2018-08-22T20:58:36Z | MEMBER | This might be worth testing with the changes from https://github.com/pydata/xarray/pull/2261, which refactors xarray's IO handling. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
409238042 | https://github.com/pydata/xarray/issues/2329#issuecomment-409238042 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwOTIzODA0Mg== | fmaussion 10050469 | 2018-07-31T14:20:06Z | 2018-07-31T14:20:06Z | MEMBER | I updated my example above to show that the chunking over the last dimension is ridiculously slow. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
409172635 | https://github.com/pydata/xarray/issues/2329#issuecomment-409172635 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwOTE3MjYzNQ== | fmaussion 10050469 | 2018-07-31T10:25:16Z | 2018-07-31T14:18:29Z | MEMBER | Sorry for the confusion, I had an obvious mistake in my timing experiment above (forgot to do the actual computations...). The dimension order does make a difference: ```python import dask as da import xarray as xr d = xr.DataArray(da.array.zeros((1000, 721, 1440), chunks=(10, 721, 1440)), dims=('z', 'y', 'x')) d.to_netcdf('da.nc') # 8.3 Gb with xr.open_dataarray('da.nc', chunks={'z':10}) as d: %timeit d.sum().load() 3.94 s ± 95.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'y':10}) as d: %timeit d.sum().load() 4.15 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'x':10}) as d: %timeit d.sum().load() 1min 54s ± 1.43 s per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'y':10, 'x':10}) as d: %timeit d.sum().load() 2min 23s ± 215 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
409168605 | https://github.com/pydata/xarray/issues/2329#issuecomment-409168605 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwOTE2ODYwNQ== | fmaussion 10050469 | 2018-07-31T10:09:36Z | 2018-07-31T13:21:34Z | MEMBER |
Can you still try to chunk along one dimension only? i.e. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
409165114 | https://github.com/pydata/xarray/issues/2329#issuecomment-409165114 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwOTE2NTExNA== | fmaussion 10050469 | 2018-07-31T09:56:54Z | 2018-07-31T10:20:32Z | MEMBER | [EDIT]: forgot the load ... <s> forget my comment about chunks - I thought this would make a difference but it's actually the opposite (to my surprise): </s> |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
409159969 | https://github.com/pydata/xarray/issues/2329#issuecomment-409159969 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwOTE1OTk2OQ== | fmaussion 10050469 | 2018-07-31T09:38:37Z | 2018-07-31T10:19:37Z | MEMBER | Out of curiosity:
- why do you chunk over lats and lons rather than time? The order of dimensions in your dataarray suggest that chunking over time could be more efficient
- can you show the output of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
408928221 | https://github.com/pydata/xarray/issues/2329#issuecomment-408928221 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwODkyODIyMQ== | rabernat 1197350 | 2018-07-30T16:37:05Z | 2018-07-30T16:37:23Z | MEMBER | Can you forget about zarr for a moment and just do a reduction on your dataset? For example:
Keep the same chunk arguments you are currently using. This will help us understand if the problem is with reading the files. Is it your intention to chunk the files contiguously in time? Depending on the underlying structure of the data within the netCDF file, this could amount to a complete transposition of the data, which could be very slow / expensive. This could have some parallels with #2004. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
408925488 | https://github.com/pydata/xarray/issues/2329#issuecomment-408925488 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwODkyNTQ4OA== | rabernat 1197350 | 2018-07-30T16:28:31Z | 2018-07-30T16:28:31Z | MEMBER |
Yes, this is what we want! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 | |
408860643 | https://github.com/pydata/xarray/issues/2329#issuecomment-408860643 | https://api.github.com/repos/pydata/xarray/issues/2329 | MDEyOklzc3VlQ29tbWVudDQwODg2MDY0Mw== | rabernat 1197350 | 2018-07-30T13:20:59Z | 2018-07-30T13:20:59Z | MEMBER | @lrntct - this sounds like a reasonable way to use zarr. We routinely do this sort of transcoding and it works reasonable well. Unfortunately something clearly isn't working right in your case. These things can be hard to debug, but we will try to help you. You might want to start by reviewing the guide I wrote for Pangeo on preparing zarr datasets. It would also be good to see a bit more detail. You posted a function If instead you have just one big netCDF file as in the example you posted above, I think I see you problem: you are calling More ideas:
- explicitly specify the chunks (rather than using Another useful piece of advice would be to use the dask distributed dashboard to monitor what is happening under the hood. You can do this by running
Hopefully these ideas can help you move forward. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Out-of-core processing with dask not working properly? 345715825 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3