issues: 433833707
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
433833707 | MDU6SXNzdWU0MzM4MzM3MDc= | 2900 | open_mfdataset with proprocess ds[var] | 12237157 | closed | 0 | 3 | 2019-04-16T15:07:36Z | 2019-04-16T19:09:34Z | 2019-04-16T19:09:34Z | CONTRIBUTOR | Code Sample, a copy-pastable example if possibleI would like to load only one variable from larger files containing 10s of variables. The files get really large when I open them. I expect them to be opened lazily also fast if I only want to extract one variable (maybe this is my misunderstand here). I hoped to use Here my minimum example with 3 files of 12 timesteps each and two variable, but I only want to load one: ```python ds = xr.open_mfdataset(path) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> def preprocess(ds,var='co2flux'): return ds[var] ds = xr.open_mfdataset(path,preprocess=preprocess)ValueError Traceback (most recent call last) <ipython-input-17-770267b86462> in <module> 1 def preprocess(ds,var='co2flux'): 2 return ds[var] ----> 3 ds = xr.open_mfdataset(path,preprocess=preprocess) /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs) 717 data_vars=data_vars, coords=coords, 718 infer_order_from_coords=infer_order_from_coords, --> 719 ids=ids) 720 except ValueError: 721 for ds in datasets: /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids) 551 # Repeatedly concatenate then merge along each dimension 552 combined = _combine_nd(combined_ids, concat_dims, compat=compat, --> 553 data_vars=data_vars, coords=coords) 554 return combined 555 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat) 473 data_vars=data_vars, 474 coords=coords, --> 475 compat=compat) 476 combined_ds = list(combined_ids.values())[0] 477 return combined_ds /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat) 491 datasets = combined_ids.values() 492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat, --> 493 data_vars, coords) 494 return new_combined_ids 495 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords) 505 if concat_dim is not None: 506 dim = None if concat_dim is _CONCAT_DIM_DEFAULT else concat_dim --> 507 sorted_datasets = sorted(datasets, key=vars_as_keys) 508 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 509 concatenated = [_auto_concat(list(ds_group), dim=dim, /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in vars_as_keys(ds) 496 497 def vars_as_keys(ds): --> 498 return tuple(sorted(ds)) 499 500 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/common.py in bool(self) 80 81 def bool(self): ---> 82 return bool(self.values) 83 84 # Python 3 uses bool, Python 2 uses nonzero ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` I was hoping that Problem descriptionI would expect from the documentation the below behaviour. Expected Output```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> ds = xr.open_mfdataset(path,preprocess=preprocess) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> ``` Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2900/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |