home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 473692721 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 6
  • peterhob 4

author_association 2

  • MEMBER 6
  • NONE 4

issue 1

  • rolling: bottleneck still not working properly with dask arrays · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
516235099 https://github.com/pydata/xarray/issues/3165#issuecomment-516235099 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjIzNTA5OQ== peterhob 13084427 2019-07-30T02:33:23Z 2019-07-30T02:33:23Z NONE

Actually, there does seem to be something fishy going on here. I find that I'm able to execute temp.rolling(x=100).construct('window').mean('window').compute() successfully but not temp.rolling(x=100).mean().compute(), even though that should mostly be equivalent to the former.

Thank you so much for pointing it out. I tried the rollling.construct and it worked! I also tried it on other netcdf files and it sure solved the problem. Thank you so much for your help!

If this is caused by Dask's scheduler and there is no quick fix yet, do you think mention the rolling.construct in the Xarray document as the recommended usage would worth doing? It can help newbies like me a lot.

Cheers, Joey

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516195053 https://github.com/pydata/xarray/issues/3165#issuecomment-516195053 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE5NTA1Mw== shoyer 1217238 2019-07-29T23:05:57Z 2019-07-29T23:05:57Z MEMBER

I think this triggers a case that dask's scheduler doesn't handle well, related to this issue: https://github.com/dask/dask/issues/874

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516193739 https://github.com/pydata/xarray/issues/3165#issuecomment-516193739 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE5MzczOQ== shoyer 1217238 2019-07-29T23:00:37Z 2019-07-29T23:00:37Z MEMBER

Actually, there does seem to be something fishy going on here. I find that I'm able to execute temp.rolling(x=100).construct('window').mean('window').compute() successfully but not temp.rolling(x=100).mean().compute(), even though that should mostly be equivalent to the former.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516193582 https://github.com/pydata/xarray/issues/3165#issuecomment-516193582 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE5MzU4Mg== shoyer 1217238 2019-07-29T22:59:48Z 2019-07-29T22:59:48Z MEMBER

For context, xarray's rolling window code creates a "virtual dimension" for the rolling window. So if your chunks are size (5000, 100) before the rolling window, they are size (5000, 100, 100) within the rolling window computation. So it's not entirely surprising that there are more issues with memory usage -- these are much bigger arrays, e.g., see ```

temp.rolling(x=100).construct('window') <xarray.DataArray (x: 5000, y: 50000, window: 100)> dask.array<shape=(5000, 50000, 100), dtype=float64, chunksize=(50, 100, 100)> Dimensions without coordinates: x, y, window ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516191152 https://github.com/pydata/xarray/issues/3165#issuecomment-516191152 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE5MTE1Mg== peterhob 13084427 2019-07-29T22:48:54Z 2019-07-29T22:48:54Z NONE

da.zeros((5000, 50000), chu

Tried but same error.

python import numpy as np import xarray as xr import dask.array as da temp= xr.DataArray(da.zeros((5000, 50000), chunks=(-1,100)),dims=("x","y")) temp.rolling(x=100).mean() Like I said, I have also saved to nc file and read it from disk (as below), but still same error.

python import numpy as np import xarray as xr import dask.array as da temp= xr.DataArray(da.zeros((5000, 50000), chunks=(-1,100)),dims=("x","y")) temp.to_netcdf("temp.nc") temp.close() test = xr.open_dataarray("temp.nc",chunks={"y":100,}) test.rolling(x=100).mean()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516187643 https://github.com/pydata/xarray/issues/3165#issuecomment-516187643 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE4NzY0Mw== shoyer 1217238 2019-07-29T22:33:56Z 2019-07-29T22:33:56Z MEMBER

You want to use the chunks argument inside da.zeros, e.g., da.zeros((5000, 50000), chunks=100).

On Mon, Jul 29, 2019 at 3:30 PM peterhob notifications@github.com wrote:

Did you try converting np.zeros((5000, 50000) to use dask.array.zeros instead? The former will allocate 2 GB of data within each chunk

Thank you for your suggestion. Tried as you suggested, still with same error.

import numpy as npimport xarray as xrimport dask.array as da# from dask.distributed import Client temp= xr.DataArray(da.zeros((5000, 50000)),dims=("x","y")).chunk({"y":100, }) temp.rolling(x=100).mean()

I have also tried saving the array to nc file and read it after that. Still rolling gives same error (with or without bottleneck and different chunks). Even though it says memory error, it doesn't consume too much memory.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3165?email_source=notifications&email_token=AAJJFVT7OTCOARO4WQZBCODQB5VQ5A5CNFSM4IHLGUAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CGFKY#issuecomment-516186795, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVXZRBM7FOHY4NAT53TQB5VQ5ANCNFSM4IHLGUAA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516186795 https://github.com/pydata/xarray/issues/3165#issuecomment-516186795 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjE4Njc5NQ== peterhob 13084427 2019-07-29T22:30:37Z 2019-07-29T22:30:37Z NONE

Did you try converting np.zeros((5000, 50000) to use dask.array.zeros instead? The former will allocate 2 GB of data within each chunk

Thank you for your suggestion. Tried as you suggested, still with same error.

```python import numpy as np import xarray as xr import dask.array as da

from dask.distributed import Client

temp= xr.DataArray(da.zeros((5000, 50000)),dims=("x","y")).chunk({"y":100, }) temp.rolling(x=100).mean() ```

I have also tried saving the array to nc file and read it after that. Still rolling gives same error (with or without bottleneck and different chunks). Even though it says memory error, it doesn't consume too much memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
516060323 https://github.com/pydata/xarray/issues/3165#issuecomment-516060323 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNjA2MDMyMw== shoyer 1217238 2019-07-29T16:20:07Z 2019-07-29T16:20:07Z MEMBER

Did you try converting np.zeros((5000, 50000) to use dask.array.zeros instead? The former will allocate 2 GB of data within each chunk

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
515906488 https://github.com/pydata/xarray/issues/3165#issuecomment-515906488 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNTkwNjQ4OA== peterhob 13084427 2019-07-29T08:55:34Z 2019-07-29T08:55:51Z NONE

Have you tried adding more chunking, e.g., along the x dimension? That’s that usual recommendation if you’re running out of memory.

Hi Shoyer,

Thanks for your reply and help. However, I have tried various chunks along each and both dimension (like 200 on x dimension, 100 on y dimension; or larger chunks like 2000 on y dimension), it doesn't work.

In both a ubuntu machine with 100 Gb memory and a local windows10 machine, it simply crashed in couple of seconds. Even though it says memory error, the code does not use much memory at all. Also even with the one dimension setup, the temp.data shows that each chunk only takes 4 mb memory (which makes me think it might be too small and then used larger chunks). I also used a new conda environment with clean install of just the necessary libraries, and the problem is still there.

Here is the neat new environment under which I tried again but gives the same errors,

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-51-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.12.3 pandas: 0.25.0 numpy: 1.16.4 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.1.0 distributed: 2.1.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.0.1 pip: 19.2.1 conda: None pytest: None IPython: None sphinx: None

By the way, the above code seems to work ok with previous 0.12.1 version Xarray and bottleneck.

Cheers, Joey

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721
515738254 https://github.com/pydata/xarray/issues/3165#issuecomment-515738254 https://api.github.com/repos/pydata/xarray/issues/3165 MDEyOklzc3VlQ29tbWVudDUxNTczODI1NA== shoyer 1217238 2019-07-28T06:55:43Z 2019-07-28T06:55:43Z MEMBER

Have you tried adding more chunking, e.g., along the x dimension? That’s that usual recommendation if you’re running out of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling: bottleneck still not working properly with dask arrays 473692721

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.269ms · About: xarray-datasette