issues: 304624171

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
304624171	MDU6SXNzdWUzMDQ2MjQxNzE=	1985	Load a small subset of data from a big dataset takes forever	22245117	closed	0			8	2018-03-13T04:27:58Z	2019-01-13T01:46:08Z	2019-01-13T01:46:08Z	CONTRIBUTOR				Code Sample ```python def cut_dataset(ds2cut, varList = ['Temp' 'S' 'Eta' 'U' 'V' 'W'], lonRange = [-180, 180], latRange = [-90, 90], depthRange = [0, float("inf")], timeRange = ['2007-09-01T00', '2008-08-31T18'], timeFreq = '1D', sampMethod = 'mean', interpC = True, saveNetCDF = False): """ Cut the dataset """ # Copy dataset ds = ds2cut.copy(deep=True) # Choose variables varList_tmp = list(varList) for varName in ds.variables: if all(x != 'time' for x in ds[varName].dims) or (varName=='time'): varList_tmp.append(varName) toDrop = list(set(ds.variables)-set(varList_tmp)) ds = ds.drop(toDrop) # Cut dataset ds = ds.sel(time = slice(min(timeRange), max(timeRange)), Xp1 = slice(min(lonRange), max(lonRange)), Yp1 = slice(min(latRange), max(latRange)), Zp1 = slice(min(depthRange), max(depthRange))) ds = ds.sel(X = slice(min(ds['Xp1'].values), max(ds['Xp1'].values)), Y = slice(min(ds['Yp1'].values), max(ds['Yp1'].values)), Z = slice(min(ds['Zp1'].values), max(ds['Zp1'].values)), Zu = slice(min(ds['Zp1'].values), max(ds['Zp1'].values)), Zl = slice(min(ds['Zp1'].values), max(ds['Zp1'].values))) # Resample if sampMethod=='snapshot': ds = ds.resample(time=timeFreq).first(skipna=False) elif sampMethod=='mean': ds = ds.resample(time=timeFreq).mean() # Create grid grid = xgcm.Grid(ds,periodic=False) # Interpolate if interpC: for varName in varList: for dim in ds[varName].dims: if len(dim)>1 and dim!='time': ds[varName] = grid.interp(ds[varName], axis=dim[0]) # Remove useless variables allDims = [] for varName in varList: for dim in ds[varName].dims: allDims.append(dim) toDrop = [] for varName in ds.variables: if len(list(set(ds[varName].dims)-set(allDims)))>0: toDrop.append(varName) ds = ds.drop(toDrop) # Save to NetCDF if saveNetCDF: ds.to_netcdf(saveNetCDF) return ds, grid 3D test ds_cut, grid_cut = cut_dataset(ds, varList = ['Eta'], latRange = [65, 65.5], depthRange = [0, 2], timeRange = ['2007-11-15T00', '2007-11-16T00'], timeFreq = '1D', sampMethod = 'mean', interpC = False, saveNetCDF = '3Dvariable.nc') 4D test ds_cut, grid_cut = cut_dataset(ds, varList = ['Temp'], lonRange = [-30, -29.5], latRange = [65, 65.5], depthRange = [0, 2], timeRange = ['2007-11-15T00', '2007-11-16T00'], timeFreq = '1D', sampMethod = 'mean', interpC = False, saveNetCDF = '4Dvariable.nc') ``` Problem description I'm working with a big dataset. However, most of the time I only need a small subset of data. My idea was to open and concatenate everything with open_mfdataset, and then extract subsets of data using the indexing routines. This approach works very good when I extract 3D variables (just lon, lat, and time), but it fails when I try to extract 4D variables (lon, lat, time, and depth). It doesn't actually fail, but to_netcdf takes forever. When I open a smaller dataset since the very beginning (let's say just November), then I'm also able to extract 4D variables. When I load the sub-dataset after using the indexing routines, does xarray need to read the whole original 4D variable? If yes, then I should probably change my approach and I should open subset of data since the very beginning. If no, am I doing something wrong? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.18.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None xarray: 0.10.1 pandas: 0.20.1 numpy: 1.11.0 scipy: 0.17.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.2 distributed: 1.18.1 matplotlib: 1.5.1 cartopy: 0.16.0 seaborn: None setuptools: 27.2.0 pip: 8.1.2 conda: 4.4.11 pytest: 2.9.1 IPython: 4.2.0 sphinx: 1.4.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1985/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

0 rows from issues_id in issues_labels
8 rows from issue in issue_comments