id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2021679408,I_kwDOAMm_X854gGEw,8504,`ValueError` when writing `NCZarr` datasets using `netcdf4` engine,50383939,closed,0,,,2,2023-12-01T22:37:42Z,2023-12-05T21:41:28Z,2023-12-05T21:41:27Z,NONE,,,,"### What is your issue? Currently, experiencing an issue in writing `NCZarr` files that I try to write using `xarray=='2023.11.0'`, when I read all the necessary data in netCDF format using `xarray.open_mfdataset(...)` function. A simple dataset such as the following: ```console Dimensions: (subbasin: 197, time: 96) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... * subbasin (subbasin) int32 71032409 71032292 ... 71027770 Data variables: RDRS_v2.1_P_UVC_10m (subbasin, time) float64 dask.array RDRS_v2.1_P_FI_SFC (subbasin, time) float64 dask.array RDRS_v2.1_P_FB_SFC (subbasin, time) float64 dask.array RDRS_v2.1_P_PR0_SFC (subbasin, time) float64 dask.array RDRS_v2.1_P_P0_SFC (subbasin, time) float64 dask.array RDRS_v2.1_P_TT_1.5m (subbasin, time) float64 dask.array RDRS_v2.1_P_HU_1.5m (subbasin, time) float64 dask.array lat (subbasin) float64 dask.array lon (subbasin) float64 dask.array crs int32 ... ``` Gives me the following error ```python >>> ds.to_netcdf(""file://path/to/test.nczarr#mode=nczarr"", engine='netcdf4') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:39, in _find_dim(grp, dimname) 38 try: ---> 39 dim = group.dimensions[dimname] 40 break AttributeError: 'NoneType' object has no attribute 'dimensions' During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:43, in _find_dim(grp, dimname) 42 try: ---> 43 group = group.parent 44 except: AttributeError: 'NoneType' object has no attribute 'parent' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 ds.to_netcdf(""file:///home/user/test.nczarr#mode=nczarr,file"", engine='netcdf4') File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 ) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1259, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1254 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1255 # to avoid this mess of conditionals 1256 try: 1257 # TODO: allow this work (setting up the file for writing array data) 1258 # to be parallelized with dask -> 1259 dump_to_store( 1260 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1261 ) 1262 if autoclose: 1263 store.close() File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1306, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1303 if encoder: 1304 variables, attrs = encoder(variables, attrs) -> 1306 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:356, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 356 self.set_variables( 357 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 358 ) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:394, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 392 name = _encode_variable_name(vn) 393 check = vn in check_encoding_set --> 394 target, source = self.prepare_variable( 395 name, v, check, unlimited_dims=unlimited_dims 396 ) 398 writer.add(source, target) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:500, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 498 nc4_var = self.ds.variables[name] 499 else: --> 500 nc4_var = self.ds.createVariable( 501 varname=name, 502 datatype=datatype, 503 dimensions=variable.dims, 504 zlib=encoding.get(""zlib"", False), 505 complevel=encoding.get(""complevel"", 4), 506 shuffle=encoding.get(""shuffle"", True), 507 fletcher32=encoding.get(""fletcher32"", False), 508 contiguous=encoding.get(""contiguous"", False), 509 chunksizes=encoding.get(""chunksizes""), 510 endian=""native"", 511 least_significant_digit=encoding.get(""least_significant_digit""), 512 fill_value=fill_value, 513 ) 515 nc4_var.setncatts(attrs) 517 target = NetCDF4ArrayWrapper(name, self) File src/netCDF4/_netCDF4.pyx:2839, in genexpr() File src/netCDF4/_netCDF4.pyx:2839, in genexpr() File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:45, in _find_dim(grp, dimname) 43 group = group.parent 44 except: ---> 45 raise ValueError(""cannot find dimension %s in this group or parent groups"" % dimname) 46 if dim is None: 47 raise KeyError(""dimension %s not defined in group %s or any group in it's family tree"" % (dimname, grp.path)) ValueError: cannot find dimension subbasin in this group or parent groups ``` The `xarray.Dataset.to_zarr(...)` function work perfectly fine. If I read the files using `xarray.open_mfdataset(..., engine='h5netcdf')` I get the following error instead: ```python >>> ds.to_netcdf(""file://path/to/test.nczarr#mode=nczarr"", engine='netcdf4') --------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[6], line 1 ----> 1 ds.to_netcdf(""file:///home/user/test.nczarr#mode=nczarr,file"", engine='netcdf4') File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 ) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1242, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1238 else: 1239 raise ValueError( 1240 f""unrecognized option 'invalid_netcdf' for engine {engine}"" 1241 ) -> 1242 store = store_open(target, mode, format, group, **kwargs) 1244 if unlimited_dims is None: 1245 unlimited_dims = dataset.encoding.get(""unlimited_dims"", None) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:367, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 353 @classmethod 354 def open( 355 cls, (...) 365 autoclose=False, 366 ): --> 367 import netCDF4 369 if isinstance(filename, os.PathLike): 370 filename = os.fspath(filename) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/__init__.py:3 1 # init for netCDF4. package 2 # Docstring comes from extension module _netCDF4. ----> 3 from ._netCDF4 import * 4 # Need explicit imports for names beginning with underscores 5 from ._netCDF4 import __doc__ ImportError: /home/user/.local/lib64/libnetcdf.so.19: undefined symbol: H5Pset_fapl_mpio ``` Any ideas? Let me know if a small `.nc` file for this example is needed. As far as I know, all the dependencies, are properly compiled. `HDF5` is compiled with the `--enable-parallel` flag, `netCDF-C` is compiled with parallel turned on, and `netCDF4-python` is also compiled against these mentioned. I also have `netcdf-fortran` that is compiled in parallel mode if it helps. Here are also the `xarray.show_versions()` details: ``` INSTALLED VERSIONS ------------------ commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1-mpi libnetcdf: 4.9.3-development xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.11.2 netCDF4: 1.7.0-development pydap: None h5netcdf: 1.3.0 h5py: 3.8.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.11.0 distributed: 2023.11.0 matplotlib: None cartopy: None seaborn: None numbagg: 0.6.4 fsspec: 2023.10.0 cupy: None pint: 0.22+computecanada sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.18.1 sphinx: None ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8504/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 2010425950,I_kwDOAMm_X8531Kpe,8482,`Filter error` using `ncdump` on `Zarr` datasets created by XArray,50383939,closed,0,,,2,2023-11-25T01:30:39Z,2023-11-25T01:36:04Z,2023-11-25T01:36:04Z,NONE,,,,"### What is your issue? I have trouble getting values of variables inside `Zarr` stores created using `XArray`. The `ncdump -v variable-name file://path/to/zarr/store#mode=zarr` gives me the following error: ```console foo@bar:~$ ncdump -v variable-name 'file:///path/to/zarr/store#mode=zarr' %%% HEADER STUFF %%% NetCDF: Filter error: undefined filter encountered Location: file /path/to/netcdf-c/ncdump/vardata.c; fcn print_rows line 478 variable-name = ``` Here is a typical header of such `Zarr` stores that I create: ```console netcdf test3 { dimensions: rlat = 628 ; rlon = 655 ; bnds = 2 ; bounds = 4 ; variables: double lat(rlat, rlon) ; lat:long_name = ""latitude"" ; lat:standard_name = ""latitude"" ; lat:units = ""degrees_north"" ; double lon(rlat, rlon) ; lon:long_name = ""longitude"" ; lon:standard_name = ""longitude"" ; lon:units = ""degrees_east"" ; float pr(rlat, rlon) ; pr:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; pr:cell_methods = ""area: mean time: mean"" ; pr:coordinates = ""lon lat"" ; pr:grid_mapping = ""crs"" ; pr:long_name = ""Precipitation"" ; pr:standard_name = ""precipitation_flux"" ; pr:units = ""kg m-2 s-1"" ; double rlat(rlat) ; rlat:actual_range = -33.625, 35.345 ; rlat:axis = ""Y"" ; rlat:long_name = ""latitude in rotated pole grid"" ; rlat:standard_name = ""grid_latitude"" ; rlat:units = ""degrees"" ; double rlon(rlon) ; rlon:actual_range = -34.045, 37.895 ; rlon:axis = ""X"" ; rlon:long_name = ""longitude in rotated pole grid"" ; rlon:standard_name = ""grid_longitude"" ; rlon:units = ""degrees"" ; float tas(rlat, rlon) ; tas:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; tas:cell_methods = ""area: mean time: point"" ; tas:coordinates = ""lon lat height"" ; tas:grid_mapping = ""crs"" ; tas:long_name = ""Near-Surface Air Temperature"" ; tas:standard_name = ""air_temperature"" ; tas:units = ""K"" ; double time_bnds(bnds) ; time_bnds:calendar = ""proleptic_gregorian"" ; time_bnds:units = ""days since 1950-01-01"" ; double vertices_latitude(rlat, rlon, bounds) ; double vertices_longitude(rlat, rlon, bounds) ; ``` I have compiled the `netcdf-c v4.9.3-development` version from the latest commit of the `netcdf-c`'s GitHub repository. I assured to have the `plugins` installed during the compiling procedure, as I understood certain shared libraries need to be installed for `HDF5`. I understand this issue does not have an MCVE, but let me know if a test case is needed. I would be more than happy to upload one here and share. By the way, `XArray` has no problem reading these `Zarr` stores back again. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8482/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1870484988,I_kwDOAMm_X85vfVX8,8120,"`open_mfdataset` exits while sending a ""Segmentation fault"" error",50383939,closed,0,,,4,2023-08-28T20:51:23Z,2023-09-01T15:43:08Z,2023-09-01T15:43:08Z,NONE,,,,"### What is your issue? I try to open about ~10 files, each 5MB as a test case, using `xarray`'s `open_mfdataset` method with the `parallel=True` option, however, it throws a ""Segmentation fault"" error as the following: ```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import xarray as xr In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}) In [3]: ds Out[3]: Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array lat (rlat, rlon) float32 dask.array * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention _