issue_comments
10 rows where issue = 473692721 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- rolling: bottleneck still not working properly with dask arrays · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
516235099 | https://github.com/pydata/xarray/issues/3165#issuecomment-516235099 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjIzNTA5OQ== | peterhob 13084427 | 2019-07-30T02:33:23Z | 2019-07-30T02:33:23Z | NONE |
Thank you so much for pointing it out. I tried the rollling.construct and it worked! I also tried it on other netcdf files and it sure solved the problem. Thank you so much for your help! If this is caused by Dask's scheduler and there is no quick fix yet, do you think mention the rolling.construct in the Xarray document as the recommended usage would worth doing? It can help newbies like me a lot. Cheers, Joey |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516195053 | https://github.com/pydata/xarray/issues/3165#issuecomment-516195053 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE5NTA1Mw== | shoyer 1217238 | 2019-07-29T23:05:57Z | 2019-07-29T23:05:57Z | MEMBER | I think this triggers a case that dask's scheduler doesn't handle well, related to this issue: https://github.com/dask/dask/issues/874 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516193739 | https://github.com/pydata/xarray/issues/3165#issuecomment-516193739 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE5MzczOQ== | shoyer 1217238 | 2019-07-29T23:00:37Z | 2019-07-29T23:00:37Z | MEMBER | Actually, there does seem to be something fishy going on here. I find that I'm able to execute |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516193582 | https://github.com/pydata/xarray/issues/3165#issuecomment-516193582 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE5MzU4Mg== | shoyer 1217238 | 2019-07-29T22:59:48Z | 2019-07-29T22:59:48Z | MEMBER | For context, xarray's rolling window code creates a "virtual dimension" for the rolling window. So if your chunks are size (5000, 100) before the rolling window, they are size (5000, 100, 100) within the rolling window computation. So it's not entirely surprising that there are more issues with memory usage -- these are much bigger arrays, e.g., see ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516191152 | https://github.com/pydata/xarray/issues/3165#issuecomment-516191152 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE5MTE1Mg== | peterhob 13084427 | 2019-07-29T22:48:54Z | 2019-07-29T22:48:54Z | NONE |
Tried but same error.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516187643 | https://github.com/pydata/xarray/issues/3165#issuecomment-516187643 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE4NzY0Mw== | shoyer 1217238 | 2019-07-29T22:33:56Z | 2019-07-29T22:33:56Z | MEMBER | You want to use the chunks argument inside da.zeros, e.g., da.zeros((5000, 50000), chunks=100). On Mon, Jul 29, 2019 at 3:30 PM peterhob notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516186795 | https://github.com/pydata/xarray/issues/3165#issuecomment-516186795 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjE4Njc5NQ== | peterhob 13084427 | 2019-07-29T22:30:37Z | 2019-07-29T22:30:37Z | NONE |
Thank you for your suggestion. Tried as you suggested, still with same error. ```python import numpy as np import xarray as xr import dask.array as da from dask.distributed import Clienttemp= xr.DataArray(da.zeros((5000, 50000)),dims=("x","y")).chunk({"y":100, }) temp.rolling(x=100).mean() ``` I have also tried saving the array to nc file and read it after that. Still rolling gives same error (with or without bottleneck and different chunks). Even though it says memory error, it doesn't consume too much memory. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
516060323 | https://github.com/pydata/xarray/issues/3165#issuecomment-516060323 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNjA2MDMyMw== | shoyer 1217238 | 2019-07-29T16:20:07Z | 2019-07-29T16:20:07Z | MEMBER | Did you try converting |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
515906488 | https://github.com/pydata/xarray/issues/3165#issuecomment-515906488 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNTkwNjQ4OA== | peterhob 13084427 | 2019-07-29T08:55:34Z | 2019-07-29T08:55:51Z | NONE |
Hi Shoyer, Thanks for your reply and help. However, I have tried various chunks along each and both dimension (like 200 on x dimension, 100 on y dimension; or larger chunks like 2000 on y dimension), it doesn't work. In both a ubuntu machine with 100 Gb memory and a local windows10 machine, it simply crashed in couple of seconds. Even though it says memory error, the code does not use much memory at all. Also even with the one dimension setup, the temp.data shows that each chunk only takes 4 mb memory (which makes me think it might be too small and then used larger chunks). I also used a new conda environment with clean install of just the necessary libraries, and the problem is still there. Here is the neat new environment under which I tried again but gives the same errors, Output of
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 | |
515738254 | https://github.com/pydata/xarray/issues/3165#issuecomment-515738254 | https://api.github.com/repos/pydata/xarray/issues/3165 | MDEyOklzc3VlQ29tbWVudDUxNTczODI1NA== | shoyer 1217238 | 2019-07-28T06:55:43Z | 2019-07-28T06:55:43Z | MEMBER | Have you tried adding more chunking, e.g., along the x dimension? That’s that usual recommendation if you’re running out of memory. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rolling: bottleneck still not working properly with dask arrays 473692721 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2