issues: 1722417436
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1722417436 | I_kwDOAMm_X85mqgEc | 7868 | `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` | 19285200 | closed | 0 | 8 | 2023-05-23T16:23:07Z | 2023-11-17T15:26:01Z | 2023-11-17T15:26:01Z | NONE | What is your issue?I noticed that As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s) Here below I provide a reproducible example ``` import xarray as xr import numpy as np Define string datarrayarr = np.array(["M6", "M3"], dtype=str) print(arr.dtype) # <U2 da = xr.DataArray(data=arr, dims=("time")) data_vars = {"str_arr": da} Create datasetds_nc_string = xr.Dataset(data_vars=data_vars) Set chunking to see behaviour at read-timeds_nc_string["str_arr"] = ds_nc_string["str_arr"].chunk(1) # chunks ((1,1),) Write dataset with NC_STRINGds_nc_string["str_arr"].encoding["dtype"] = str ds_nc_string.to_netcdf("/tmp/nc_string.nc") Write dataset with NC_CHARds_nc_char = xr.Dataset(data_vars=data_vars) ds_nc_char["str_arr"].encoding["dtype"] = "S2" ds_nc_char.to_netcdf("/tmp/nc_char.nc") When NC_STRING, chunks="auto" does not work when string are saved as--> NC STRING is read as object, and dask can not estimate chunk size !If chunks={} it reads the NC_STRING array in a single dask chunk !!!ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks="auto") # NotImplementedError ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks={}) # Works ds_nc_string.chunks # chunks (2,) With NC_CHAR, chunks={} and chunks="auto" works and returns the same result!ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks={}) NC_STRING is read back as objectds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks=None) ds_nc_string["str_arr"].dtype # object NC_CHAR is read back as fixed length byte-string representation (S2)ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks=None) ds_nc_char["str_arr"].dtype # S2 ds_nc_char["str_arr"].data.astype(str) # U2 ``` Questions:
- Related issues are: - https://github.com/pydata/xarray/issues/7652 - https://github.com/pydata/xarray/issues/2059 - https://github.com/pydata/xarray/pull/7654 - https://github.com/pydata/xarray/issues/2040 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7868/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |