issues: 166593563

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
166593563	MDU6SXNzdWUxNjY1OTM1NjM=	912	Speed up operations with xarray dataset	7504461	closed	0			12	2016-07-20T14:21:40Z	2016-12-29T01:07:52Z	2016-12-29T01:07:52Z	NONE				Hi all, I've been recently having hard times to manipulate a `xarray` `dataset`. Not sure if I am making some awkward mistake, but it is taking an unacceptable amount of time to perform simple operations. Here is a piece of my code: `ncfile = glob('conc_size_12m.nc') ds = xray.open_dataset(ncfile[0]) ds` <xarray.Dataset> Dimensions: (burst: 2485, duration: 2400, z: 160) Coordinates: zdist (z) float64 0.01014 0.02027 0.03041 0.04054 0.05068 ... burst_nr (burst) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ... time (duration, burst) datetime64[ns] 2014-09-16T07:00:00 ... burst (burst) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... * duration (duration) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... * z (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Data variables: conc_profs (duration, z, burst) float32 3.99138e-05 4.23636e-05 ... burst_duration (duration) float64 0.0 0.1246 0.2493 0.3739 0.4985 ... grainSize_profs (duration, z, burst) float32 200.0 200.0 200.0 200.0 ... `ds.nbytes * (2 ** -30)` `7.15415246784687` `%time conc_avg = ds.conc_profs.chunk(2400).mean(('z','duration'))` `CPU times: user 12 ms, sys: 0 ns, total: 12 ms Wall time: 9.84 ms` `%time conc_avg.load()` `%time conc_avg = ds.conc_profs.isel(burst=0).mean(('z','duration'))` `CPU times: user 708 ms, sys: 2.87 s, total: 3.58 s Wall time: 1min 56s` If I work with chunks, it is impossible to load back the array in a reasonable amount of time (I waited for more than 30 min). Looping over the dimension `burst`, it takes about 2 minutes per loop which is also quite unreasonable. I was wondering if the problem could stem from the creation of my `dataset` which I saved into this 7+GB netCDF file. Could that be the case? I am working in a Linux Inter Core i5 which is supposed to handle these manipulations with no hicups. I use the IOOS environment to run `xarray` (vr '0.7.1'). Can someone provide me some advice on how to optimize my script? I am happy to supply with more details if needed. Cheers,	{ "url": "https://api.github.com/repos/pydata/xarray/issues/912/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
12 rows from issue in issue_comments