home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open" and user = 50383939 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1902108672 I_kwDOAMm_X85xX-AA 8207 Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset` kasra-keshavarz 50383939 open 0     4 2023-09-19T02:44:02Z 2023-12-01T22:29:49Z   NONE      

What is your issue?

I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow:

```python-console In [1]: import os; import dask

In [2]: import xarray as xr

In [3]: from dask.distributed import Client, LocalCluster

In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker

In [5]: client = Client(cluster)

In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'

In [7]: ds = xr.open_mfdataset('./remapped/*.nc', chunks={'COMID': 1400}, parallel=True)

In [8]: ds.to_netcdf('./out2.nc')

```

And below, is the error I am getting:

Error message ```python-console In [8]: ds.to_netcdf('./out2.nc') /home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. warnings.warn( 2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0) Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x2b863b0e94c0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x2b86218d4ee0>, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None))) kwargs: {} Exception: "RuntimeError('NetCDF: HDF error')" --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 ds.to_netcdf('./out2.nc') File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2249 encoding = {} 2250 from xarray.backends.api import to_netcdf -> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2253 self, 2254 path, 2255 mode=mode, 2256 format=format, 2257 group=group, 2258 engine=engine, 2259 encoding=encoding, 2260 unlimited_dims=unlimited_dims, 2261 compute=compute, 2262 multifile=False, 2263 invalid_netcdf=invalid_netcdf, 2264 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1252 if multifile: 1253 return writer, store -> 1255 writes = writer.sync(compute=compute) 1257 if isinstance(target, BytesIO): 1258 store.sync() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs) 253 if chunkmanager_store_kwargs is None: 254 chunkmanager_store_kwargs = {} --> 256 delayed_store = chunkmanager.store( 257 self.sources, 258 self.targets, 259 lock=self.lock, 260 compute=compute, 261 flush=True, 262 regions=self.regions, 263 **chunkmanager_store_kwargs, 264 ) 265 self.sources = [] 266 self.targets = [] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, **kwargs) 203 def store( 204 self, 205 sources: DaskArray | Sequence[DaskArray], 206 targets: Any, 207 **kwargs, 208 ): 209 from dask.array import store --> 211 return store( 212 sources=sources, 213 targets=targets, 214 **kwargs, 215 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(***failed resolving arguments***) 1234 elif compute: 1235 store_dsk = HighLevelGraph(layers, dependencies) -> 1236 compute_as_if_collection(Array, store_dsk, map_keys, **kwargs) 1237 return None 1239 else: File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs) 367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get) 368 dsk2 = optimization_function(cls)(dsk, keys, **kwargs) --> 369 return schedule(dsk2, keys, **kwargs) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 3265 should_rejoin = False 3266 try: -> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3268 finally: 3269 for f in futures.values(): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous) 2390 local_worker = None 2392 with shorten_traceback(): -> 2393 return self.sync( 2394 self._gather, 2395 futures, 2396 errors=errors, 2397 direct=direct, 2398 local_worker=local_worker, 2399 asynchronous=asynchronous, 2400 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__() 483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray: --> 484 return np.asarray(self.get_duck_array(), dtype=dtype) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array() 486 def get_duck_array(self): --> 487 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array() 663 def get_duck_array(self): --> 664 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array() 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): --> 557 array = array.get_duck_array() 558 return _wrap_numpy_scalars(array) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array() 73 def get_duck_array(self): ---> 74 return self.func(self.array.get_duck_array()) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array() 550 def get_duck_array(self): --> 551 array = self.array[self.key] 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__() 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter() 836 """Support explicit indexing by delegating to a raw indexing method. 837 838 Outer and/or vectorized indexers are supported by indexing a second time (...) 855 Indexing result, in the form of a duck numpy-array. 856 """ 857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 858 result = raw_indexing_method(raw_key.tuple) 859 if numpy_indices.tuple: 860 # index the loaded np.ndarray 861 result = NumpyIndexingAdapter(result)[numpy_indices] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem() 110 try: 111 with self.datastore.lock: --> 112 original_array = self.get_array(needs_lock=False) 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array() 90 def get_array(self, needs_lock=True): ---> 91 ds = self.datastore._acquire(needs_lock) 92 variable = ds.variables[self.variable_name] 93 variable.set_auto_maskandscale(False) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire() 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars() File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```

The header of individual NetCDF ones are also in the following:

Individual NetCDF header ```console $ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc netcdf ab_models_remapped_1980-04-20-13-00-00 { dimensions: COMID = 14980 ; time = UNLIMITED ; // (24 currently) variables: int time(time) ; time:long_name = "time" ; time:units = "hours since 1980-04-20 12:00:00" ; time:calendar = "gregorian" ; time:standard_name = "time" ; time:axis = "T" ; double latitude(COMID) ; latitude:long_name = "latitude" ; latitude:units = "degrees_north" ; latitude:standard_name = "latitude" ; double longitude(COMID) ; longitude:long_name = "longitude" ; longitude:units = "degrees_east" ; longitude:standard_name = "longitude" ; double COMID(COMID) ; COMID:long_name = "shape ID" ; COMID:units = "1" ; double RDRS_v2.1_P_P0_SFC(time, COMID) ; RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ; RDRS_v2.1_P_P0_SFC:long_name = "Forecast: Surface pressure" ; RDRS_v2.1_P_P0_SFC:units = "mb" ; double RDRS_v2.1_P_HU_1.5m(time, COMID) ; RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_HU_1.5m:long_name = "Forecast: Specific humidity" ; RDRS_v2.1_P_HU_1.5m:units = "kg kg**-1" ; double RDRS_v2.1_P_TT_1.5m(time, COMID) ; RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_TT_1.5m:long_name = "Forecast: Air temperature" ; RDRS_v2.1_P_TT_1.5m:units = "deg_C" ; double RDRS_v2.1_P_UVC_10m(time, COMID) ; RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ; RDRS_v2.1_P_UVC_10m:long_name = "Forecast: Wind Modulus (derived using UU and VV)" ; RDRS_v2.1_P_UVC_10m:units = "kts" ; double RDRS_v2.1_A_PR0_SFC(time, COMID) ; RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ; RDRS_v2.1_A_PR0_SFC:long_name = "Analysis: Quantity of precipitation" ; RDRS_v2.1_A_PR0_SFC:units = "m" ; double RDRS_v2.1_P_FB_SFC(time, COMID) ; RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FB_SFC:long_name = "Forecast: Downward solar flux" ; RDRS_v2.1_P_FB_SFC:units = "W m**-2" ; double RDRS_v2.1_P_FI_SFC(time, COMID) ; RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FI_SFC:long_name = "Forecast: Surface incoming infrared flux" ; RDRS_v2.1_P_FI_SFC:units = "W m**-2" ; ```

I am running xarray and Dask on an HPC, so the "modules" I have loaded are the following: ```console module list

Currently Loaded Modules: 1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math) 2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys) 3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a 4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5 5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1 ```

Any suggestion is greatly appreciated!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8207/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1878016712 I_kwDOAMm_X85v8ELI 8137 `time` variable encoding changes upon using `to_netcdf` method on a `DataSet` kasra-keshavarz 50383939 open 0     2 2023-09-01T20:34:58Z 2023-09-15T05:32:15Z   NONE      

What is your issue?

Upon trying to use the to_netcdf method of the Dataset, the encoding (local attributes) of the time variable changes. More specifically, the units has changed into another format. Here is a reproducible example:

```python-console $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr imp
In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: np.random.seed(0) ...: temperature = 15 + 8 * np.random.randn(2, 2, 25) ...: precipitation = 10 * np.random.rand(2, 2, 25) ...: lon = [[-99.83, -99.32], [-99.79, -99.23]] ...: lat = [[42.25, 42.21], [42.63, 42.59]] ...: time = pd.date_range("2014-09-06", "2014-09-07",freq='H') ...: reference_time = pd.Timestamp("2014-09-05")

In [5]: ds = xr.Dataset( ...: data_vars=dict( ...: temperature=(["x", "y", "time"], temperature), ...: precipitation=(["x", "y", "time"], precipitation), ...: ), ...: coords=dict( ...: lon=(["x", "y"], lon), ...: lat=(["x", "y"], lat), ...: time=time, ...: reference_time=reference_time, ...: ), ...: attrs=dict(description="Weather related data."), ...: ) ...: ds Out[5]: <xarray.Dataset> Dimensions: (x: 2, y: 2, time: 25) Coordinates: lon (x, y) float64 -99.83 -99.32 -99.79 -99.23 lat (x, y) float64 42.25 42.21 42.63 42.59 * time (time) datetime64[ns] 2014-09-06 ... 2014-09-07 reference_time datetime64[ns] 2014-09-05 Dimensions without coordinates: x, y Data variables: temperature (x, y, time) float64 29.11 18.2 22.83 ... 29.29 16.02 18.22 precipitation (x, y, time) float64 4.239 6.064 0.1919 ... 8.727 2.735 7.98 Attributes: description: Weather related data.

In [6]: ds.time Out[6]: <xarray.DataArray 'time' (time: 25)> array(['2014-09-06T00:00:00.000000000', '2014-09-06T01:00:00.000000000', '2014-09-06T02:00:00.000000000', '2014-09-06T03:00:00.000000000', '2014-09-06T04:00:00.000000000', '2014-09-06T05:00:00.000000000', '2014-09-06T06:00:00.000000000', '2014-09-06T07:00:00.000000000', '2014-09-06T08:00:00.000000000', '2014-09-06T09:00:00.000000000', '2014-09-06T10:00:00.000000000', '2014-09-06T11:00:00.000000000', '2014-09-06T12:00:00.000000000', '2014-09-06T13:00:00.000000000', '2014-09-06T14:00:00.000000000', '2014-09-06T15:00:00.000000000', '2014-09-06T16:00:00.000000000', '2014-09-06T17:00:00.000000000', '2014-09-06T18:00:00.000000000', '2014-09-06T19:00:00.000000000', '2014-09-06T20:00:00.000000000', '2014-09-06T21:00:00.000000000', '2014-09-06T22:00:00.000000000', '2014-09-06T23:00:00.000000000', '2014-09-07T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2014-09-06 ... 2014-09-07 reference_time datetime64[ns] 2014-09-05

In [7]: ds.time.encoding Out[7]: {}

In [9]: ds.to_netcdf("./test.nc", encoding={'time': {'units': 'hours since 2014-09-01 12:00:00'}})

In [10]: !ncdump -h ./test.nc netcdf test { dimensions: x = 2 ; y = 2 ; time = 25 ; variables: double temperature(x, y, time) ; temperature:_FillValue = NaN ; temperature:coordinates = "lat lon reference_time" ; double precipitation(x, y, time) ; precipitation:_FillValue = NaN ; precipitation:coordinates = "lat lon reference_time" ; double lon(x, y) ; lon:_FillValue = NaN ; double lat(x, y) ; lat:_FillValue = NaN ; int64 time(time) ; time:units = "hours since 2014-09-01T12:00:00" ; <------- this is the problem time:calendar = "proleptic_gregorian" ; int64 reference_time ; reference_time:units = "days since 2014-09-05 00:00:00" ; reference_time:calendar = "proleptic_gregorian" ;

// global attributes: :description = "Weather related data." ; }

In [11]: ds.info() xarray.Dataset { dimensions: x = 2 ; y = 2 ; time = 25 ;

variables: float64 temperature(x, y, time) ; float64 precipitation(x, y, time) ; float64 lon(x, y) ; float64 lat(x, y) ; datetime64[ns] time(time) ; datetime64[ns] reference_time() ;

// global attributes: :description = Weather related data. ; } ```

The only thing that I am concerned about is the T value in the "hours since 2014-09-01T12:00:00" string in the final netCDF file. I would like to have control over it, however, even by providing an encoding dictionary for the units attribute, the T is placed in the attribute string.

The sample dataset is taken from here: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html

How may I evade this issue? Any suggestions. I did my best to Google. Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8137/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 18.116ms · About: xarray-datasette