issues: 955029073
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
955029073 | MDU6SXNzdWU5NTUwMjkwNzM= | 5643 | open_mfdataset from zarr store, consolidated=None warns, consolidated=False is slow, consolidated=True fails | 5509356 | open | 0 | 0 | 2021-07-28T16:23:53Z | 2021-08-14T17:41:40Z | NONE | What happened: With xarray 0.19.0, using open_mfdataset to read from a zarr store written with a previous version of xarray (with consolidated=True), I get the following results depending on the consolidated parameter:
Hopefully it's okay if I include the actual code rather than trying to create a test zarr store that reproduces the situation: ```python import s3fs import xarray as xr top_group_url = 's3://hrrrzarr/sfc/20200801/20200801_00z_anl.zarr' group_url = f'{top_group_url}/surface/GUST' subgroup_url = f"{group_url}/surface" fs = s3fs.S3FileSystem(anon=True) What I expected to happen:
Anything else we need to know?: This zarr store cannot be (usefully) opened by xarray without using open_mfdataset due to an issue I brought up in discussion #5584 which no one has replied to so far. Basically, the person creating it assumed that if they used xarray to write it, xarray would have no problem reading it, but since there's a slash in the variable names, xarray created it as a deeply-nested zarr store instead of a store with each variable as a single-level (sub)group that xarray would have been able to handle. Each variable was written like this:
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.7 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.07.1 distributed: 2021.07.1 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.1 conda: None pytest: None IPython: 7.25.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5643/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |