home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

5 rows where repo = 13221727 and user = 50383939 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, state_reason, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 2

type 1

  • issue 5

repo 1

  • xarray · 5 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2021679408 I_kwDOAMm_X854gGEw 8504 `ValueError` when writing `NCZarr` datasets using `netcdf4` engine kasra-keshavarz 50383939 closed 0     2 2023-12-01T22:37:42Z 2023-12-05T21:41:28Z 2023-12-05T21:41:27Z NONE      

What is your issue?

Currently, experiencing an issue in writing NCZarr files that I try to write using xarray=='2023.11.0', when I read all the necessary data in netCDF format using xarray.open_mfdataset(...) function.

A simple dataset such as the following: console <xarray.Dataset> Dimensions: (subbasin: 197, time: 96) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... * subbasin (subbasin) int32 71032409 71032292 ... 71027770 Data variables: RDRS_v2.1_P_UVC_10m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_PR0_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> lat (subbasin) float64 dask.array<chunksize=(197,), meta=np.ndarray> lon (subbasin) float64 dask.array<chunksize=(197,), meta=np.ndarray> crs int32 ...

Gives me the following error

```python

ds.to_netcdf("file://path/to/test.nczarr#mode=nczarr", engine='netcdf4')


AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:39, in _find_dim(grp, dimname) 38 try: ---> 39 dim = group.dimensions[dimname] 40 break

AttributeError: 'NoneType' object has no attribute 'dimensions'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:43, in _find_dim(grp, dimname) 42 try: ---> 43 group = group.parent 44 except:

AttributeError: 'NoneType' object has no attribute 'parent'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 ds.to_netcdf("file:///home/user/test.nczarr#mode=nczarr,file", engine='netcdf4')

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1259, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1254 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1255 # to avoid this mess of conditionals 1256 try: 1257 # TODO: allow this work (setting up the file for writing array data) 1258 # to be parallelized with dask -> 1259 dump_to_store( 1260 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1261 ) 1262 if autoclose: 1263 store.close()

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1306, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1303 if encoder: 1304 variables, attrs = encoder(variables, attrs) -> 1306 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:356, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 356 self.set_variables( 357 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 358 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:394, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 392 name = _encode_variable_name(vn) 393 check = vn in check_encoding_set --> 394 target, source = self.prepare_variable( 395 name, v, check, unlimited_dims=unlimited_dims 396 ) 398 writer.add(source, target)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:500, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 498 nc4_var = self.ds.variables[name] 499 else: --> 500 nc4_var = self.ds.createVariable( 501 varname=name, 502 datatype=datatype, 503 dimensions=variable.dims, 504 zlib=encoding.get("zlib", False), 505 complevel=encoding.get("complevel", 4), 506 shuffle=encoding.get("shuffle", True), 507 fletcher32=encoding.get("fletcher32", False), 508 contiguous=encoding.get("contiguous", False), 509 chunksizes=encoding.get("chunksizes"), 510 endian="native", 511 least_significant_digit=encoding.get("least_significant_digit"), 512 fill_value=fill_value, 513 ) 515 nc4_var.setncatts(attrs) 517 target = NetCDF4ArrayWrapper(name, self)

File src/netCDF4/_netCDF4.pyx:2839, in genexpr()

File src/netCDF4/_netCDF4.pyx:2839, in genexpr()

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:45, in _find_dim(grp, dimname) 43 group = group.parent 44 except: ---> 45 raise ValueError("cannot find dimension %s in this group or parent groups" % dimname) 46 if dim is None: 47 raise KeyError("dimension %s not defined in group %s or any group in it's family tree" % (dimname, grp.path))

ValueError: cannot find dimension subbasin in this group or parent groups ```

The xarray.Dataset.to_zarr(...) function work perfectly fine.

If I read the files using xarray.open_mfdataset(..., engine='h5netcdf') I get the following error instead: ```python

ds.to_netcdf("file://path/to/test.nczarr#mode=nczarr", engine='netcdf4')


ImportError Traceback (most recent call last) Cell In[6], line 1 ----> 1 ds.to_netcdf("file:///home/user/test.nczarr#mode=nczarr,file", engine='netcdf4')

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1242, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1238 else: 1239 raise ValueError( 1240 f"unrecognized option 'invalid_netcdf' for engine {engine}" 1241 ) -> 1242 store = store_open(target, mode, format, group, **kwargs) 1244 if unlimited_dims is None: 1245 unlimited_dims = dataset.encoding.get("unlimited_dims", None)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:367, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 353 @classmethod 354 def open( 355 cls, (...) 365 autoclose=False, 366 ): --> 367 import netCDF4 369 if isinstance(filename, os.PathLike): 370 filename = os.fspath(filename)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/init.py:3 1 # init for netCDF4. package 2 # Docstring comes from extension module _netCDF4. ----> 3 from ._netCDF4 import * 4 # Need explicit imports for names beginning with underscores 5 from ._netCDF4 import doc

ImportError: /home/user/.local/lib64/libnetcdf.so.19: undefined symbol: H5Pset_fapl_mpio ```

Any ideas?

Let me know if a small .nc file for this example is needed. As far as I know, all the dependencies, are properly compiled. HDF5 is compiled with the --enable-parallel flag, netCDF-C is compiled with parallel turned on, and netCDF4-python is also compiled against these mentioned. I also have netcdf-fortran that is compiled in parallel mode if it helps.

Here are also the xarray.show_versions() details: ``` INSTALLED VERSIONS


commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1-mpi libnetcdf: 4.9.3-development

xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.11.2 netCDF4: 1.7.0-development pydap: None h5netcdf: 1.3.0 h5py: 3.8.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.11.0 distributed: 2023.11.0 matplotlib: None cartopy: None seaborn: None numbagg: 0.6.4 fsspec: 2023.10.0 cupy: None pint: 0.22+computecanada sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.18.1 sphinx: None ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8504/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
1902108672 I_kwDOAMm_X85xX-AA 8207 Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset` kasra-keshavarz 50383939 open 0     4 2023-09-19T02:44:02Z 2023-12-01T22:29:49Z   NONE      

What is your issue?

I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow:

```python-console In [1]: import os; import dask

In [2]: import xarray as xr

In [3]: from dask.distributed import Client, LocalCluster

In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker

In [5]: client = Client(cluster)

In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'

In [7]: ds = xr.open_mfdataset('./remapped/*.nc', chunks={'COMID': 1400}, parallel=True)

In [8]: ds.to_netcdf('./out2.nc')

```

And below, is the error I am getting:

Error message ```python-console In [8]: ds.to_netcdf('./out2.nc') /home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. warnings.warn( 2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0) Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x2b863b0e94c0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x2b86218d4ee0>, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None))) kwargs: {} Exception: "RuntimeError('NetCDF: HDF error')" --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 ds.to_netcdf('./out2.nc') File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2249 encoding = {} 2250 from xarray.backends.api import to_netcdf -> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2253 self, 2254 path, 2255 mode=mode, 2256 format=format, 2257 group=group, 2258 engine=engine, 2259 encoding=encoding, 2260 unlimited_dims=unlimited_dims, 2261 compute=compute, 2262 multifile=False, 2263 invalid_netcdf=invalid_netcdf, 2264 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1252 if multifile: 1253 return writer, store -> 1255 writes = writer.sync(compute=compute) 1257 if isinstance(target, BytesIO): 1258 store.sync() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs) 253 if chunkmanager_store_kwargs is None: 254 chunkmanager_store_kwargs = {} --> 256 delayed_store = chunkmanager.store( 257 self.sources, 258 self.targets, 259 lock=self.lock, 260 compute=compute, 261 flush=True, 262 regions=self.regions, 263 **chunkmanager_store_kwargs, 264 ) 265 self.sources = [] 266 self.targets = [] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, **kwargs) 203 def store( 204 self, 205 sources: DaskArray | Sequence[DaskArray], 206 targets: Any, 207 **kwargs, 208 ): 209 from dask.array import store --> 211 return store( 212 sources=sources, 213 targets=targets, 214 **kwargs, 215 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(***failed resolving arguments***) 1234 elif compute: 1235 store_dsk = HighLevelGraph(layers, dependencies) -> 1236 compute_as_if_collection(Array, store_dsk, map_keys, **kwargs) 1237 return None 1239 else: File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs) 367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get) 368 dsk2 = optimization_function(cls)(dsk, keys, **kwargs) --> 369 return schedule(dsk2, keys, **kwargs) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 3265 should_rejoin = False 3266 try: -> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3268 finally: 3269 for f in futures.values(): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous) 2390 local_worker = None 2392 with shorten_traceback(): -> 2393 return self.sync( 2394 self._gather, 2395 futures, 2396 errors=errors, 2397 direct=direct, 2398 local_worker=local_worker, 2399 asynchronous=asynchronous, 2400 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__() 483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray: --> 484 return np.asarray(self.get_duck_array(), dtype=dtype) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array() 486 def get_duck_array(self): --> 487 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array() 663 def get_duck_array(self): --> 664 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array() 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): --> 557 array = array.get_duck_array() 558 return _wrap_numpy_scalars(array) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array() 73 def get_duck_array(self): ---> 74 return self.func(self.array.get_duck_array()) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array() 550 def get_duck_array(self): --> 551 array = self.array[self.key] 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__() 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter() 836 """Support explicit indexing by delegating to a raw indexing method. 837 838 Outer and/or vectorized indexers are supported by indexing a second time (...) 855 Indexing result, in the form of a duck numpy-array. 856 """ 857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 858 result = raw_indexing_method(raw_key.tuple) 859 if numpy_indices.tuple: 860 # index the loaded np.ndarray 861 result = NumpyIndexingAdapter(result)[numpy_indices] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem() 110 try: 111 with self.datastore.lock: --> 112 original_array = self.get_array(needs_lock=False) 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array() 90 def get_array(self, needs_lock=True): ---> 91 ds = self.datastore._acquire(needs_lock) 92 variable = ds.variables[self.variable_name] 93 variable.set_auto_maskandscale(False) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire() 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars() File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```

The header of individual NetCDF ones are also in the following:

Individual NetCDF header ```console $ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc netcdf ab_models_remapped_1980-04-20-13-00-00 { dimensions: COMID = 14980 ; time = UNLIMITED ; // (24 currently) variables: int time(time) ; time:long_name = "time" ; time:units = "hours since 1980-04-20 12:00:00" ; time:calendar = "gregorian" ; time:standard_name = "time" ; time:axis = "T" ; double latitude(COMID) ; latitude:long_name = "latitude" ; latitude:units = "degrees_north" ; latitude:standard_name = "latitude" ; double longitude(COMID) ; longitude:long_name = "longitude" ; longitude:units = "degrees_east" ; longitude:standard_name = "longitude" ; double COMID(COMID) ; COMID:long_name = "shape ID" ; COMID:units = "1" ; double RDRS_v2.1_P_P0_SFC(time, COMID) ; RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ; RDRS_v2.1_P_P0_SFC:long_name = "Forecast: Surface pressure" ; RDRS_v2.1_P_P0_SFC:units = "mb" ; double RDRS_v2.1_P_HU_1.5m(time, COMID) ; RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_HU_1.5m:long_name = "Forecast: Specific humidity" ; RDRS_v2.1_P_HU_1.5m:units = "kg kg**-1" ; double RDRS_v2.1_P_TT_1.5m(time, COMID) ; RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_TT_1.5m:long_name = "Forecast: Air temperature" ; RDRS_v2.1_P_TT_1.5m:units = "deg_C" ; double RDRS_v2.1_P_UVC_10m(time, COMID) ; RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ; RDRS_v2.1_P_UVC_10m:long_name = "Forecast: Wind Modulus (derived using UU and VV)" ; RDRS_v2.1_P_UVC_10m:units = "kts" ; double RDRS_v2.1_A_PR0_SFC(time, COMID) ; RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ; RDRS_v2.1_A_PR0_SFC:long_name = "Analysis: Quantity of precipitation" ; RDRS_v2.1_A_PR0_SFC:units = "m" ; double RDRS_v2.1_P_FB_SFC(time, COMID) ; RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FB_SFC:long_name = "Forecast: Downward solar flux" ; RDRS_v2.1_P_FB_SFC:units = "W m**-2" ; double RDRS_v2.1_P_FI_SFC(time, COMID) ; RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FI_SFC:long_name = "Forecast: Surface incoming infrared flux" ; RDRS_v2.1_P_FI_SFC:units = "W m**-2" ; ```

I am running xarray and Dask on an HPC, so the "modules" I have loaded are the following: ```console module list

Currently Loaded Modules: 1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math) 2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys) 3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a 4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5 5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1 ```

Any suggestion is greatly appreciated!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8207/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2010425950 I_kwDOAMm_X8531Kpe 8482 `Filter error` using `ncdump` on `Zarr` datasets created by XArray kasra-keshavarz 50383939 closed 0     2 2023-11-25T01:30:39Z 2023-11-25T01:36:04Z 2023-11-25T01:36:04Z NONE      

What is your issue?

I have trouble getting values of variables inside Zarr stores created using XArray. The ncdump -v variable-name file://path/to/zarr/store#mode=zarr gives me the following error: ```console foo@bar:~$ ncdump -v variable-name 'file:///path/to/zarr/store#mode=zarr' %%% HEADER STUFF %%%

NetCDF: Filter error: undefined filter encountered Location: file /path/to/netcdf-c/ncdump/vardata.c; fcn print_rows line 478 variable-name = ```

Here is a typical header of such Zarr stores that I create:

console netcdf test3 { dimensions: rlat = 628 ; rlon = 655 ; bnds = 2 ; bounds = 4 ; variables: double lat(rlat, rlon) ; lat:long_name = "latitude" ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; double lon(rlat, rlon) ; lon:long_name = "longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float pr(rlat, rlon) ; pr:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; pr:cell_methods = "area: mean time: mean" ; pr:coordinates = "lon lat" ; pr:grid_mapping = "crs" ; pr:long_name = "Precipitation" ; pr:standard_name = "precipitation_flux" ; pr:units = "kg m-2 s-1" ; double rlat(rlat) ; rlat:actual_range = -33.625, 35.345 ; rlat:axis = "Y" ; rlat:long_name = "latitude in rotated pole grid" ; rlat:standard_name = "grid_latitude" ; rlat:units = "degrees" ; double rlon(rlon) ; rlon:actual_range = -34.045, 37.895 ; rlon:axis = "X" ; rlon:long_name = "longitude in rotated pole grid" ; rlon:standard_name = "grid_longitude" ; rlon:units = "degrees" ; float tas(rlat, rlon) ; tas:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; tas:cell_methods = "area: mean time: point" ; tas:coordinates = "lon lat height" ; tas:grid_mapping = "crs" ; tas:long_name = "Near-Surface Air Temperature" ; tas:standard_name = "air_temperature" ; tas:units = "K" ; double time_bnds(bnds) ; time_bnds:calendar = "proleptic_gregorian" ; time_bnds:units = "days since 1950-01-01" ; double vertices_latitude(rlat, rlon, bounds) ; double vertices_longitude(rlat, rlon, bounds) ; I have compiled the netcdf-c v4.9.3-development version from the latest commit of the netcdf-c's GitHub repository. I assured to have the plugins installed during the compiling procedure, as I understood certain shared libraries need to be installed for HDF5.

I understand this issue does not have an MCVE, but let me know if a test case is needed. I would be more than happy to upload one here and share.

By the way, XArray has no problem reading these Zarr stores back again.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8482/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1878016712 I_kwDOAMm_X85v8ELI 8137 `time` variable encoding changes upon using `to_netcdf` method on a `DataSet` kasra-keshavarz 50383939 open 0     2 2023-09-01T20:34:58Z 2023-09-15T05:32:15Z   NONE      

What is your issue?

Upon trying to use the to_netcdf method of the Dataset, the encoding (local attributes) of the time variable changes. More specifically, the units has changed into another format. Here is a reproducible example:

```python-console $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr imp
In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: np.random.seed(0) ...: temperature = 15 + 8 * np.random.randn(2, 2, 25) ...: precipitation = 10 * np.random.rand(2, 2, 25) ...: lon = [[-99.83, -99.32], [-99.79, -99.23]] ...: lat = [[42.25, 42.21], [42.63, 42.59]] ...: time = pd.date_range("2014-09-06", "2014-09-07",freq='H') ...: reference_time = pd.Timestamp("2014-09-05")

In [5]: ds = xr.Dataset( ...: data_vars=dict( ...: temperature=(["x", "y", "time"], temperature), ...: precipitation=(["x", "y", "time"], precipitation), ...: ), ...: coords=dict( ...: lon=(["x", "y"], lon), ...: lat=(["x", "y"], lat), ...: time=time, ...: reference_time=reference_time, ...: ), ...: attrs=dict(description="Weather related data."), ...: ) ...: ds Out[5]: <xarray.Dataset> Dimensions: (x: 2, y: 2, time: 25) Coordinates: lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 lat (x, y) float64 42.25 42.21 42.63 42.59 * time (time) datetime64[ns] 2014-09-06 ... 2014-09-07 reference_time datetime64[ns] 2014-09-05 Dimensions without coordinates: x, y Data variables: temperature (x, y, time) float64 29.11 18.2 22.83 ... 29.29 16.02 18.22 precipitation (x, y, time) float64 4.239 6.064 0.1919 ... 8.727 2.735 7.98 Attributes: description: Weather related data.

In [6]: ds.time Out[6]: <xarray.DataArray 'time' (time: 25)> array(['2014-09-06T00:00:00.000000000', '2014-09-06T01:00:00.000000000', '2014-09-06T02:00:00.000000000', '2014-09-06T03:00:00.000000000', '2014-09-06T04:00:00.000000000', '2014-09-06T05:00:00.000000000', '2014-09-06T06:00:00.000000000', '2014-09-06T07:00:00.000000000', '2014-09-06T08:00:00.000000000', '2014-09-06T09:00:00.000000000', '2014-09-06T10:00:00.000000000', '2014-09-06T11:00:00.000000000', '2014-09-06T12:00:00.000000000', '2014-09-06T13:00:00.000000000', '2014-09-06T14:00:00.000000000', '2014-09-06T15:00:00.000000000', '2014-09-06T16:00:00.000000000', '2014-09-06T17:00:00.000000000', '2014-09-06T18:00:00.000000000', '2014-09-06T19:00:00.000000000', '2014-09-06T20:00:00.000000000', '2014-09-06T21:00:00.000000000', '2014-09-06T22:00:00.000000000', '2014-09-06T23:00:00.000000000', '2014-09-07T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2014-09-06 ... 2014-09-07 reference_time datetime64[ns] 2014-09-05

In [7]: ds.time.encoding Out[7]: {}

In [9]: ds.to_netcdf("./test.nc", encoding={'time': {'units': 'hours since 2014-09-01 12:00:00'}})

In [10]: !ncdump -h ./test.nc netcdf test { dimensions: x = 2 ; y = 2 ; time = 25 ; variables: double temperature(x, y, time) ; temperature:_FillValue = NaN ; temperature:coordinates = "lat lon reference_time" ; double precipitation(x, y, time) ; precipitation:_FillValue = NaN ; precipitation:coordinates = "lat lon reference_time" ; double lon(x, y) ; lon:_FillValue = NaN ; double lat(x, y) ; lat:_FillValue = NaN ; int64 time(time) ; time:units = "hours since 2014-09-01T12:00:00" ; <------- this is the problem time:calendar = "proleptic_gregorian" ; int64 reference_time ; reference_time:units = "days since 2014-09-05 00:00:00" ; reference_time:calendar = "proleptic_gregorian" ;

// global attributes: :description = "Weather related data." ; }

In [11]: ds.info() xarray.Dataset { dimensions: x = 2 ; y = 2 ; time = 25 ;

variables: float64 temperature(x, y, time) ; float64 precipitation(x, y, time) ; float64 lon(x, y) ; float64 lat(x, y) ; datetime64[ns] time(time) ; datetime64[ns] reference_time() ;

// global attributes: :description = Weather related data. ; } ```

The only thing that I am concerned about is the T value in the "hours since 2014-09-01T12:00:00" string in the final netCDF file. I would like to have control over it, however, even by providing an encoding dictionary for the units attribute, the T is placed in the attribute string.

The sample dataset is taken from here: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html

How may I evade this issue? Any suggestions. I did my best to Google. Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8137/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1870484988 I_kwDOAMm_X85vfVX8 8120 `open_mfdataset` exits while sending a "Segmentation fault" error kasra-keshavarz 50383939 closed 0     4 2023-08-28T20:51:23Z 2023-09-01T15:43:08Z 2023-09-01T15:43:08Z NONE      

What is your issue?

I try to open about ~10 files, each 5MB as a test case, using xarray's open_mfdataset method with the parallel=True option, however, it throws a "Segmentation fault" error as the following:

```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention <Product>_<Type... License: These data are provided by the Canadian Surface Prediction ... history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1... NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne... CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...

In [4]: type(ds) Out[4]: xarray.core.dataset.Dataset

In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True) [gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) [gra-login3:25527] *** Process received signal *** [gra-login3:25527] Signal: Segmentation fault (11) [gra-login3:25527] Signal code: (128) [gra-login3:25527] Failing at address: (nil) Segmentation fault

```

Here is the version of xarray:

```python In [5]: xr.show_versions() /home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/init.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0

xarray: 2023.7.0 pandas: 1.4.0 numpy: 1.21.2 scipy: 1.8.0 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.0 distributed: 2023.8.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 60.2.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.10.0 sphinx: None ```

I'm working on an HPC, so if a list "modules" I have loaded helps, here it is: ```console $ module list

Currently Loaded Modules: 1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo) 2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t) 3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io) 4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys)

Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques io: Input/output software / Logiciel d'écriture/lecture t: Tools for development / Outils de développement vis: Visualisation software / Logiciels de visualisation geo: Geography libraries/apps / Logiciels de géographie phys: Physics libraries/apps / Logiciels de physique H: Hidden Module ```

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8120/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 27.367ms · About: xarray-datasette