home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 435311415

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
435311415 MDU6SXNzdWU0MzUzMTE0MTU= 2908 More efficient rolling with large dask arrays 9040990 closed 0     3 2019-04-19T21:35:59Z 2019-10-04T17:04:37Z 2019-10-04T17:04:37Z NONE      

Code Sample

```python import xarray as xr import dask.array as da

dsize=[62,12,100,192,288] array1=da.random.random(dsize,chunks=(dsize[0],dsize[1],1,dsize[3],int(dsize[4]/2))) array2=xr.DataArray(array1) rollingmean=array2.rolling(dim_1=3,center=True).mean() # <-- this kills all workers

```

Problem description

I'm working on NCAR's cheyenne with a 36GB netcdf using dask_jobqueue.PBSCluster, and trying to calculate the running-mean along one dimension. Despite having plenty of memory reserved (400GB), I can watch DataArray.rolling blow up the bytes stored in the dashboard until the job hangs and all the workers are killed.

The above snippet reproduces the issue with the same array size and chunksize as what I'm working with. This worker-killing behavior does not occur for arrays that are 100x smaller. I've found a speedy way to calculate what I need without using rolling, but I thought I should bring this to your attention regardless.

In case it's relevant, here's how I'm setting up the dask cluster on cheyenne: ```python from dask.distributed import Client from dask_jobqueue import PBSCluster #version 0.4.1

cluster=PBSCluster(cores=36, processes=9, memory='109GB', project=myproj, resource_spec='select=1:ncpus=36:mem=109G', queue='regular', walltime='02:00:00') numnodes=4 client = Client(cluster) cluster.scale(numnodes*9)

```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 (default, Dec 14 2018, 19:28:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.62-60.64.8-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.1 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: 4.6.13 pytest: None IPython: 7.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2908/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.927ms · About: xarray-datasette