id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 435311415,MDU6SXNzdWU0MzUzMTE0MTU=,2908,More efficient rolling with large dask arrays,9040990,closed,0,,,3,2019-04-19T21:35:59Z,2019-10-04T17:04:37Z,2019-10-04T17:04:37Z,NONE,,,,"#### Code Sample ```python import xarray as xr import dask.array as da dsize=[62,12,100,192,288] array1=da.random.random(dsize,chunks=(dsize[0],dsize[1],1,dsize[3],int(dsize[4]/2))) array2=xr.DataArray(array1) rollingmean=array2.rolling(dim_1=3,center=True).mean() # <-- this kills all workers ``` #### Problem description I'm working on NCAR's cheyenne with a 36GB netcdf using dask_jobqueue.PBSCluster, and trying to calculate the running-mean along one dimension. Despite having plenty of memory reserved (400GB), I can watch DataArray.rolling blow up the bytes stored in the dashboard until the job hangs and all the workers are killed. The above snippet reproduces the issue with the same array size and chunksize as what I'm working with. This worker-killing behavior does not occur for arrays that are 100x smaller. I've found a speedy way to calculate what I need without using rolling, but I thought I should bring this to your attention regardless. In case it's relevant, here's how I'm setting up the dask cluster on cheyenne: ```python from dask.distributed import Client from dask_jobqueue import PBSCluster #version 0.4.1 cluster=PBSCluster(cores=36, processes=9, memory='109GB', project=myproj, resource_spec='select=1:ncpus=36:mem=109G', queue='regular', walltime='02:00:00') numnodes=4 client = Client(cluster) cluster.scale(numnodes*9) ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 (default, Dec 14 2018, 19:28:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.62-60.64.8-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.1 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: 4.6.13 pytest: None IPython: 7.3.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2908/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue