issues: 243927150

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
243927150	MDU6SXNzdWUyNDM5MjcxNTA=	1481	Excessive memory usage when printing multi-file Dataset	29717790	closed	0			9	2017-07-19T05:34:25Z	2017-09-22T19:30:03Z	2017-09-22T19:30:03Z	NONE				I have a dataset comprising 25 output files from the ROMS ocean model. They are netCDF files ("averages" files in ROMS jargon) containing a number of variables, but most of the storage is devoted to a few time-varying oceanographic variables, either 2D or 3D in space. I have post-processed the files by packing the oceanographic variables to int32 form using the netCDF add_offset and scale_factor attributes. Each file has 100 records in the unlimited dimension (ocean_time) so the complete dataset has 2500 records. The 25 files total 56.8 GiB so would expand to roughly 230 GiB in float64 form. I open the 25 files with xarray.open_mfdataset, concatenating along the unlimited dimension. This takes a few seconds. I then print() the resulting xarray.Dataset. This takes a few seconds more. All good so far. But when I vary the number of these files, n, that I include in my xarray.Dataset I get surprising and inconvenient results. All works as expected in reasonable time with n <= 8 and with n >= 19. But with 9 <= n <= 18, the interpreter that's processing the code (pythonw.exe via Ipython) consumes steadily more memory until the 12-14 GiB that's available on my machine is exhausted. The attached script exposes the problem. In this case the file sequence consists of one file name repeated n times. The value of n currently hard-coded into the script is 10. With this value, the final statement in the script--printing the dataset--will exhaust the memory on my PC in about 10 seconds, if I fail to kill the process first. I have put a copy of the ROMS output file here: ftp://ftp.niwa.co.nz/incoming/hadfield/roms_avg_0001.nc mgh_example_test_mfdataset.py.txt: ```python """Explore a performance/memory bug relating to multi-file datasets """ import xarray as xr import os %% Specify the list of files. Repeat the same file to get a mult-file dataset root = os.path.join( 'D:\', 'Mirror', 'hpcf', 'working', 'hadfield', 'work', 'cook', 'roms', 'sim34', 'run', 'bran-2009-2012-wrfnz-1.20') file = [os.path.join(root, 'roms_avg_0001.nc')] file = file*10 print('Number of files:', len(file)) %% Create a multi-file dataset with the open_mfdataset function. ds = xr.open_mfdataset(file, concat_dim='ocean_time') print('The dataset has been successfully opened') %% Print a summary print(ds) print('The dataset has been printed') ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1481/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

3 rows from issues_id in issues_labels
9 rows from issue in issue_comments