home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "closed" and user = 50383939 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, state_reason, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 3

state 1

  • closed · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2021679408 I_kwDOAMm_X854gGEw 8504 `ValueError` when writing `NCZarr` datasets using `netcdf4` engine kasra-keshavarz 50383939 closed 0     2 2023-12-01T22:37:42Z 2023-12-05T21:41:28Z 2023-12-05T21:41:27Z NONE      

What is your issue?

Currently, experiencing an issue in writing NCZarr files that I try to write using xarray=='2023.11.0', when I read all the necessary data in netCDF format using xarray.open_mfdataset(...) function.

A simple dataset such as the following: console <xarray.Dataset> Dimensions: (subbasin: 197, time: 96) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... * subbasin (subbasin) int32 71032409 71032292 ... 71027770 Data variables: RDRS_v2.1_P_UVC_10m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_PR0_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (subbasin, time) float64 dask.array<chunksize=(197, 1), meta=np.ndarray> lat (subbasin) float64 dask.array<chunksize=(197,), meta=np.ndarray> lon (subbasin) float64 dask.array<chunksize=(197,), meta=np.ndarray> crs int32 ...

Gives me the following error

```python

ds.to_netcdf("file://path/to/test.nczarr#mode=nczarr", engine='netcdf4')


AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:39, in _find_dim(grp, dimname) 38 try: ---> 39 dim = group.dimensions[dimname] 40 break

AttributeError: 'NoneType' object has no attribute 'dimensions'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last) File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:43, in _find_dim(grp, dimname) 42 try: ---> 43 group = group.parent 44 except:

AttributeError: 'NoneType' object has no attribute 'parent'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 ds.to_netcdf("file:///home/user/test.nczarr#mode=nczarr,file", engine='netcdf4')

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1259, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1254 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1255 # to avoid this mess of conditionals 1256 try: 1257 # TODO: allow this work (setting up the file for writing array data) 1258 # to be parallelized with dask -> 1259 dump_to_store( 1260 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1261 ) 1262 if autoclose: 1263 store.close()

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1306, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1303 if encoder: 1304 variables, attrs = encoder(variables, attrs) -> 1306 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:356, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 356 self.set_variables( 357 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 358 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/common.py:394, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 392 name = _encode_variable_name(vn) 393 check = vn in check_encoding_set --> 394 target, source = self.prepare_variable( 395 name, v, check, unlimited_dims=unlimited_dims 396 ) 398 writer.add(source, target)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:500, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 498 nc4_var = self.ds.variables[name] 499 else: --> 500 nc4_var = self.ds.createVariable( 501 varname=name, 502 datatype=datatype, 503 dimensions=variable.dims, 504 zlib=encoding.get("zlib", False), 505 complevel=encoding.get("complevel", 4), 506 shuffle=encoding.get("shuffle", True), 507 fletcher32=encoding.get("fletcher32", False), 508 contiguous=encoding.get("contiguous", False), 509 chunksizes=encoding.get("chunksizes"), 510 endian="native", 511 least_significant_digit=encoding.get("least_significant_digit"), 512 fill_value=fill_value, 513 ) 515 nc4_var.setncatts(attrs) 517 target = NetCDF4ArrayWrapper(name, self)

File src/netCDF4/_netCDF4.pyx:2839, in genexpr()

File src/netCDF4/_netCDF4.pyx:2839, in genexpr()

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/utils.py:45, in _find_dim(grp, dimname) 43 group = group.parent 44 except: ---> 45 raise ValueError("cannot find dimension %s in this group or parent groups" % dimname) 46 if dim is None: 47 raise KeyError("dimension %s not defined in group %s or any group in it's family tree" % (dimname, grp.path))

ValueError: cannot find dimension subbasin in this group or parent groups ```

The xarray.Dataset.to_zarr(...) function work perfectly fine.

If I read the files using xarray.open_mfdataset(..., engine='h5netcdf') I get the following error instead: ```python

ds.to_netcdf("file://path/to/test.nczarr#mode=nczarr", engine='netcdf4')


ImportError Traceback (most recent call last) Cell In[6], line 1 ----> 1 ds.to_netcdf("file:///home/user/test.nczarr#mode=nczarr,file", engine='netcdf4')

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/core/dataset.py:2280, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2277 encoding = {} 2278 from xarray.backends.api import to_netcdf -> 2280 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2281 self, 2282 path, 2283 mode=mode, 2284 format=format, 2285 group=group, 2286 engine=engine, 2287 encoding=encoding, 2288 unlimited_dims=unlimited_dims, 2289 compute=compute, 2290 multifile=False, 2291 invalid_netcdf=invalid_netcdf, 2292 )

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/api.py:1242, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1238 else: 1239 raise ValueError( 1240 f"unrecognized option 'invalid_netcdf' for engine {engine}" 1241 ) -> 1242 store = store_open(target, mode, format, group, **kwargs) 1244 if unlimited_dims is None: 1245 unlimited_dims = dataset.encoding.get("unlimited_dims", None)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:367, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 353 @classmethod 354 def open( 355 cls, (...) 365 autoclose=False, 366 ): --> 367 import netCDF4 369 if isinstance(filename, os.PathLike): 370 filename = os.fspath(filename)

File ~/virtual-envs/zarrenv/lib/python3.10/site-packages/netCDF4-1.7.0-py3.10-linux-x86_64.egg/netCDF4/init.py:3 1 # init for netCDF4. package 2 # Docstring comes from extension module _netCDF4. ----> 3 from ._netCDF4 import * 4 # Need explicit imports for names beginning with underscores 5 from ._netCDF4 import doc

ImportError: /home/user/.local/lib64/libnetcdf.so.19: undefined symbol: H5Pset_fapl_mpio ```

Any ideas?

Let me know if a small .nc file for this example is needed. As far as I know, all the dependencies, are properly compiled. HDF5 is compiled with the --enable-parallel flag, netCDF-C is compiled with parallel turned on, and netCDF4-python is also compiled against these mentioned. I also have netcdf-fortran that is compiled in parallel mode if it helps.

Here are also the xarray.show_versions() details: ``` INSTALLED VERSIONS


commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1-mpi libnetcdf: 4.9.3-development

xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.11.2 netCDF4: 1.7.0-development pydap: None h5netcdf: 1.3.0 h5py: 3.8.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.11.0 distributed: 2023.11.0 matplotlib: None cartopy: None seaborn: None numbagg: 0.6.4 fsspec: 2023.10.0 cupy: None pint: 0.22+computecanada sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.18.1 sphinx: None ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8504/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
2010425950 I_kwDOAMm_X8531Kpe 8482 `Filter error` using `ncdump` on `Zarr` datasets created by XArray kasra-keshavarz 50383939 closed 0     2 2023-11-25T01:30:39Z 2023-11-25T01:36:04Z 2023-11-25T01:36:04Z NONE      

What is your issue?

I have trouble getting values of variables inside Zarr stores created using XArray. The ncdump -v variable-name file://path/to/zarr/store#mode=zarr gives me the following error: ```console foo@bar:~$ ncdump -v variable-name 'file:///path/to/zarr/store#mode=zarr' %%% HEADER STUFF %%%

NetCDF: Filter error: undefined filter encountered Location: file /path/to/netcdf-c/ncdump/vardata.c; fcn print_rows line 478 variable-name = ```

Here is a typical header of such Zarr stores that I create:

console netcdf test3 { dimensions: rlat = 628 ; rlon = 655 ; bnds = 2 ; bounds = 4 ; variables: double lat(rlat, rlon) ; lat:long_name = "latitude" ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; double lon(rlat, rlon) ; lon:long_name = "longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float pr(rlat, rlon) ; pr:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; pr:cell_methods = "area: mean time: mean" ; pr:coordinates = "lon lat" ; pr:grid_mapping = "crs" ; pr:long_name = "Precipitation" ; pr:standard_name = "precipitation_flux" ; pr:units = "kg m-2 s-1" ; double rlat(rlat) ; rlat:actual_range = -33.625, 35.345 ; rlat:axis = "Y" ; rlat:long_name = "latitude in rotated pole grid" ; rlat:standard_name = "grid_latitude" ; rlat:units = "degrees" ; double rlon(rlon) ; rlon:actual_range = -34.045, 37.895 ; rlon:axis = "X" ; rlon:long_name = "longitude in rotated pole grid" ; rlon:standard_name = "grid_longitude" ; rlon:units = "degrees" ; float tas(rlat, rlon) ; tas:_QuantizeBitRoundNumberOfSignificantDigits = 12 ; tas:cell_methods = "area: mean time: point" ; tas:coordinates = "lon lat height" ; tas:grid_mapping = "crs" ; tas:long_name = "Near-Surface Air Temperature" ; tas:standard_name = "air_temperature" ; tas:units = "K" ; double time_bnds(bnds) ; time_bnds:calendar = "proleptic_gregorian" ; time_bnds:units = "days since 1950-01-01" ; double vertices_latitude(rlat, rlon, bounds) ; double vertices_longitude(rlat, rlon, bounds) ; I have compiled the netcdf-c v4.9.3-development version from the latest commit of the netcdf-c's GitHub repository. I assured to have the plugins installed during the compiling procedure, as I understood certain shared libraries need to be installed for HDF5.

I understand this issue does not have an MCVE, but let me know if a test case is needed. I would be more than happy to upload one here and share.

By the way, XArray has no problem reading these Zarr stores back again.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8482/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1870484988 I_kwDOAMm_X85vfVX8 8120 `open_mfdataset` exits while sending a "Segmentation fault" error kasra-keshavarz 50383939 closed 0     4 2023-08-28T20:51:23Z 2023-09-01T15:43:08Z 2023-09-01T15:43:08Z NONE      

What is your issue?

I try to open about ~10 files, each 5MB as a test case, using xarray's open_mfdataset method with the parallel=True option, however, it throws a "Segmentation fault" error as the following:

```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention <Product>_<Type... License: These data are provided by the Canadian Surface Prediction ... history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1... NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne... CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...

In [4]: type(ds) Out[4]: xarray.core.dataset.Dataset

In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True) [gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) [gra-login3:25527] *** Process received signal *** [gra-login3:25527] Signal: Segmentation fault (11) [gra-login3:25527] Signal code: (128) [gra-login3:25527] Failing at address: (nil) Segmentation fault

```

Here is the version of xarray:

```python In [5]: xr.show_versions() /home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/init.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0

xarray: 2023.7.0 pandas: 1.4.0 numpy: 1.21.2 scipy: 1.8.0 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.0 distributed: 2023.8.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 60.2.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.10.0 sphinx: None ```

I'm working on an HPC, so if a list "modules" I have loaded helps, here it is: ```console $ module list

Currently Loaded Modules: 1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo) 2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t) 3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io) 4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys)

Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques io: Input/output software / Logiciel d'écriture/lecture t: Tools for development / Outils de développement vis: Visualisation software / Logiciels de visualisation geo: Geography libraries/apps / Logiciels de géographie phys: Physics libraries/apps / Logiciels de physique H: Hidden Module ```

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8120/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.39ms · About: xarray-datasette