home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 433833707

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
433833707 MDU6SXNzdWU0MzM4MzM3MDc= 2900 open_mfdataset with proprocess ds[var] 12237157 closed 0     3 2019-04-16T15:07:36Z 2019-04-16T19:09:34Z 2019-04-16T19:09:34Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

I would like to load only one variable from larger files containing 10s of variables. The files get really large when I open them. I expect them to be opened lazily also fast if I only want to extract one variable (maybe this is my misunderstand here).

I hoped to use preprocess, but I don't get it working.

Here my minimum example with 3 files of 12 timesteps each and two variable, but I only want to load one:

```python ds = xr.open_mfdataset(path) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

def preprocess(ds,var='co2flux'): return ds[var]

ds = xr.open_mfdataset(path,preprocess=preprocess)

ValueError Traceback (most recent call last) <ipython-input-17-770267b86462> in <module> 1 def preprocess(ds,var='co2flux'): 2 return ds[var] ----> 3 ds = xr.open_mfdataset(path,preprocess=preprocess)

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs) 717 data_vars=data_vars, coords=coords, 718 infer_order_from_coords=infer_order_from_coords, --> 719 ids=ids) 720 except ValueError: 721 for ds in datasets:

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids) 551 # Repeatedly concatenate then merge along each dimension 552 combined = _combine_nd(combined_ids, concat_dims, compat=compat, --> 553 data_vars=data_vars, coords=coords) 554 return combined 555

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat) 473 data_vars=data_vars, 474 coords=coords, --> 475 compat=compat) 476 combined_ds = list(combined_ids.values())[0] 477 return combined_ds

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat) 491 datasets = combined_ids.values() 492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat, --> 493 data_vars, coords) 494 return new_combined_ids 495

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords) 505 if concat_dim is not None: 506 dim = None if concat_dim is _CONCAT_DIM_DEFAULT else concat_dim --> 507 sorted_datasets = sorted(datasets, key=vars_as_keys) 508 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 509 concatenated = [_auto_concat(list(ds_group), dim=dim,

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in vars_as_keys(ds) 496 497 def vars_as_keys(ds): --> 498 return tuple(sorted(ds)) 499 500

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/common.py in bool(self) 80 81 def bool(self): ---> 82 return bool(self.values) 83 84 # Python 3 uses bool, Python 2 uses nonzero

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

I was hoping that data_vars could work like this but it has no effect. Probably I got the documentation wrong here. python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

Problem description

I would expect from the documentation the below behaviour.

Expected Output

```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

ds = xr.open_mfdataset(path,preprocess=preprocess) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-696.18.7.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.14.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.2.0 distributed: 1.27.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.4.0 pip: 18.1 conda: None pytest: None IPython: 7.0.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2900/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 160.92ms · About: xarray-datasette