github: issues: 1 row where type = "issue" and user = 9040990 sorted by updated

1 row where type = "issue" and user = 9040990 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
435311415	MDU6SXNzdWU0MzUzMTE0MTU=	2908	More efficient rolling with large dask arrays	emaroon 9040990	closed	0			3	2019-04-19T21:35:59Z	2019-10-04T17:04:37Z	2019-10-04T17:04:37Z	NONE				Code Sample ```python import xarray as xr import dask.array as da dsize=[62,12,100,192,288] array1=da.random.random(dsize,chunks=(dsize[0],dsize[1],1,dsize[3],int(dsize[4]/2))) array2=xr.DataArray(array1) rollingmean=array2.rolling(dim_1=3,center=True).mean() # <-- this kills all workers ``` Problem description I'm working on NCAR's cheyenne with a 36GB netcdf using dask_jobqueue.PBSCluster, and trying to calculate the running-mean along one dimension. Despite having plenty of memory reserved (400GB), I can watch DataArray.rolling blow up the bytes stored in the dashboard until the job hangs and all the workers are killed. The above snippet reproduces the issue with the same array size and chunksize as what I'm working with. This worker-killing behavior does not occur for arrays that are 100x smaller. I've found a speedy way to calculate what I need without using rolling, but I thought I should bring this to your attention regardless. In case it's relevant, here's how I'm setting up the dask cluster on cheyenne: ```python from dask.distributed import Client from dask_jobqueue import PBSCluster #version 0.4.1 cluster=PBSCluster(cores=36, processes=9, memory='109GB', project=myproj, resource_spec='select=1:ncpus=36:mem=109G', queue='regular', walltime='02:00:00') numnodes=4 client = Client(cluster) cluster.scale(numnodes*9) ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 (default, Dec 14 2018, 19:28:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.62-60.64.8-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.1 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: 4.6.13 pytest: None IPython: 7.3.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2908/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where type = "issue" and user = 9040990 sorted by updated_at descending

Code Sample

Problem description

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`