issues
8 rows where repo = 13221727, state = "closed" and user = 40218891 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1953059418 | I_kwDOAMm_X850aVJa | 8345 | `.stack` produces large chunks | yt87 40218891 | closed | 0 | 4 | 2023-10-19T21:09:56Z | 2023-10-26T21:20:05Z | 2023-10-26T21:20:05Z | NONE | What happened?Xarray What did you expect to happen?I expect this to work. #5754 is closed. Minimal Complete Verifiable Example```Python import dask.array import numpy as np import xarray as xr var = xr.Variable( ("t", "z", "u", "x", "y"), dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)), ) da = xr.DataArray(var) def sum(ds): return ds.sum(dim="u") with dask.config.set(**{"array.slicing.split_large_chunks": True}): da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") da2 ``` MVCE confirmation
Relevant log output```PythonIndexError Traceback (most recent call last) Cell In[21], line 5 2 return ds.sum(dim="u") 4 with dask.config.set(**{"array.slicing.split_large_chunks": True}): ----> 5 da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") 6 da2 File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse) 2795 def unstack( 2796 self, 2797 dim: Dims = None, 2798 fill_value: Any = dtypes.NA, 2799 sparse: bool = False, 2800 ) -> Self: 2801 """ 2802 Unstack existing dimensions corresponding to MultiIndexes into 2803 multiple new dimensions. (...) 2853 DataArray.stack 2854 """ -> 2855 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2856 return self._from_temp_dataset(ds) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse) 5498 for d in dims: 5499 if needs_full_reindex: -> 5500 result = result._unstack_full_reindex( 5501 d, stacked_indexes[d], fill_value, sparse 5502 ) 5503 else: 5504 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5393 if name not in index_vars: 5394 if dim in var.dims: -> 5395 variables[name] = var.unstack({dim: new_dim_sizes}) 5396 else: 5397 variables[name] = var File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, **dimensions_kwargs) 1928 result = self 1929 for old_dim, dims in dimensions.items(): -> 1930 result = result._unstack_once_full(dims, old_dim) 1931 return result File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim) 1817 reordered = self.transpose(*dim_order) 1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes -> 1820 new_data = reordered.data.reshape(new_shape) 1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names 1823 return type(self)( 1824 new_dims, new_data, self._attrs, self._encoding, fastpath=True 1825 ) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape) 2217 if len(shape) == 1 and not isinstance(shape[0], Number): 2218 shape = shape[0] -> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit) 283 else: 284 chunk_plan.append("auto") --> 285 outchunks = normalize_chunks( 286 chunk_plan, 287 shape=shape, 288 limit=limit, 289 dtype=x.dtype, 290 previous_chunks=inchunks, 291 ) 293 x2 = x.rechunk(inchunks) 295 # Construct graph File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks) 3092 chunks = tuple("auto" if isinstance(c, str) and c != "auto" else c for c in chunks) 3094 if any(c == "auto" for c in chunks): -> 3095 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3097 if shape is not None: 3098 chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in <dictcomp>(.0) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): IndexError: tuple index out of range ``` Anything else we need to know?The most recent traceback entry point to an issue in dask code. Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8345/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1956383344 | I_kwDOAMm_X850nApw | 8358 | Writing to zarr archive fails on resampled dataset | yt87 40218891 | closed | 0 | 1 | 2023-10-23T05:30:36Z | 2023-10-23T15:46:20Z | 2023-10-23T15:46:19Z | NONE | What happened?I am not sure where this belongs: xarray, dask or zarr. When a dataset is resampled to a semi-monthly frequency, the method What did you expect to happen?I think this should work without having to rechunk the result before writing to the archive. Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output```PythonValueError Traceback (most recent call last) Cell In[63], line 4 2 ds = xr.Dataset({"foo": ("time", np.arange(1, 366)), "time": time}).chunk(time=5) 3 dsr = ds.resample(time="SM").mean() ----> 4 dsr.to_zarr('/tmp/foo', mode='w') 5 #dsr.isel(time=slice(0, -1)).to_zarr('/tmp/foo', mode='w') File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:2490, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2358 """Write dataset contents to a zarr group. 2359 2360 Zarr chunks are determined in the following way: (...) 2486 The I/O user guide, with more details and examples. 2487 """ 2488 from xarray.backends.api import to_zarr -> 2490 return to_zarr( # type: ignore[call-overload,misc] 2491 self, 2492 store=store, 2493 chunk_store=chunk_store, 2494 storage_options=storage_options, 2495 mode=mode, 2496 synchronizer=synchronizer, 2497 group=group, 2498 encoding=encoding, 2499 compute=compute, 2500 consolidated=consolidated, 2501 append_dim=append_dim, 2502 region=region, 2503 safe_chunks=safe_chunks, 2504 zarr_version=zarr_version, 2505 write_empty_chunks=write_empty_chunks, 2506 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2507 ) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1708, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1706 writer = ArrayWriter() 1707 # TODO: figure out how to properly handle unlimited_dims -> 1708 dump_to_store(dataset, zstore, writer, encoding=encoding) 1709 writes = writer.sync( 1710 compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs 1711 ) 1713 if compute: File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1308, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1305 if encoder: 1306 variables, attrs = encoder(variables, attrs) -> 1308 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:631, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 628 self.set_attributes(attributes) 629 self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims) --> 631 self.set_variables( 632 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 633 ) 634 if self._consolidate_on_close: 635 zarr.consolidate_metadata(self.zarr_group.store) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:687, in ZarrStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 684 zarr_array = self.zarr_group[name] 685 else: 686 # new variable --> 687 encoding = extract_zarr_variable_encoding( 688 v, raise_on_invalid=check, name=vn, safe_chunks=self._safe_chunks 689 ) 690 encoded_attrs = {} 691 # the magic for storing the hidden dimension data File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:281, in extract_zarr_variable_encoding(variable, raise_on_invalid, name, safe_chunks) 278 if k not in valid_encodings: 279 del encoding[k] --> 281 chunks = _determine_zarr_chunks( 282 encoding.get("chunks"), variable.chunks, variable.ndim, name, safe_chunks 283 ) 284 encoding["chunks"] = chunks 285 return encoding File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:138, in _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name, safe_chunks)
132 raise ValueError(
133 "Zarr requires uniform chunk sizes except for final chunk. "
134 f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. "
135 "Consider rechunking using ValueError: Final chunk of Zarr array must be the same size or smaller than the first. Variable named 'foo' has incompatible Dask chunks ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2),).Consider either rechunking using Anything else we need to know?I can also achieve what I want without having to rechunk with
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: 0.5.1
fsspec: 2023.10.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8358/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1940650207 | I_kwDOAMm_X85zq_jf | 8300 | Inconsistent behaviour of xarray.concat | yt87 40218891 | closed | 0 | 2 | 2023-10-12T19:23:32Z | 2023-10-12T19:48:01Z | 2023-10-12T19:48:00Z | NONE | What is your issue?I am not sure if it is a bug or a feature: ``` import numpy as np import pandas as pd import xarray as xr temp = 15 + 8 * np.random.randn(2, 2, 2)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset(
{"temperature": (["x", "y", "time"], temp), "latitude_longitude": 0},
coords={
"lon": (["x", "y"], lon),
"lat": (["x", "y"], lat),
"time": ("time", pd.date_range("2014-09-05", periods=2)),
},
)
print(
xr.concat(
[ds.isel(time=0), ds.isel(time=1)], "time", data_vars="minimal"
).latitude_longitude
)
print(
xr.concat(
[ds.isel(time=slice(0, 1)), ds.isel(time=slice(1, 2))], "time", data_vars="minimal"
).latitude_longitude
)
``` BTW, this is xarray 2023.9.0. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8300/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1806386948 | I_kwDOAMm_X85rq0cE | 7990 | Random crashes in netcdf when dask client has multiple threads | yt87 40218891 | closed | 0 | 1 | 2023-07-16T01:00:55Z | 2023-08-23T00:18:18Z | 2023-08-23T00:18:17Z | NONE | What happened?The data files can be found here: https://noaadata.apps.nsidc.org/NOAA/G02202_V4/north/monthly/. The example code below crashes randomly: the file processed when the crash occurs differs between runs. This happens only when What did you expect to happen?No response Minimal Complete Verifiable Example```Python from pathlib import Path import pandas as pd from dask.distributed import Client import xarray as xr client = Client(n_workers=1, threads_per_worker=4) DATADIR = Path("/mnt/sdc1/icec/NSIDC") year = 2020 times = pd.date_range(f"{year}-01-01", f"{year}-12-01", freq="MS", name="time") paths = [ DATADIR / "monthly" / f"seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc" for t in times ] for n in range(10): ds = xr.open_mfdataset( paths, combine="nested", concat_dim="tdim", parallel=True, engine="netcdf4", ) del ds HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0: #000: H5G.c line 442 in H5Gopen2(): unable to synchronously open group major: Symbol table minor: Unable to create file #001: H5G.c line 399 in H5G__open_api_common(): can't set object access arguments major: Symbol table minor: Can't set value #002: H5VLint.c line 2669 in H5VL_setup_acc_args(): invalid location identifier major: Invalid arguments to routine minor: Inappropriate type #003: H5VLint.c line 1787 in H5VL_vol_object(): invalid identifier type to function major: Invalid arguments to routine minor: Inappropriate type HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0: #000: H5G.c line 887 in H5Gclose(): not a group ID major: Invalid arguments to routine minor: Inappropriate type 2023-07-16 00:35:47,833 - distributed.worker - WARNING - Compute Failed Key: open_dataset-09a155bb-5079-406a-83c4-737933c409c7 Function: execute_task args: ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202001_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]]))) kwargs: {} Exception: "OSError(-101, 'NetCDF: HDF error')" 2023-07-16 00:35:47,834 - distributed.worker - WARNING - Compute Failed Key: open_dataset-14e239f4-7e16-4891-a350-b55979d4a754 Function: execute_task args: ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]]))) kwargs: {} Exception: "OSError(-101, 'NetCDF: HDF error')" OSError Traceback (most recent call last) Cell In[1], line 19 14 paths = [ 15 DATADIR / "monthly" / f"seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc" 16 for t in times 17 ] 18 for n in range(10): ---> 19 ds = xr.open_mfdataset( 20 paths, 21 combine="nested", 22 concat_dim="tdim", 23 parallel=True, 24 engine="netcdf4", 25 ) 26 del ds File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:1050, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) 1045 datasets = [preprocess(ds) for ds in datasets] 1047 if parallel: 1048 # calling compute here will return the datasets/file_objs lists, 1049 # the underlying datasets will still be stored as dask arrays -> 1050 datasets, closers = dask.compute(datasets, closers) 1052 # Combine all datasets, closing them in case of a ValueError 1053 try: File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:570, in open_dataset() 558 decoders = _resolve_decoders_kwargs( 559 decode_cf, 560 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 566 decode_coords=decode_coords, 567 ) 569 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 570 backend_ds = backend.open_dataset( 571 filename_or_obj, 572 drop_variables=drop_variables, 573 decoders, 574 kwargs, 575 ) 576 ds = _dataset_from_backend_dataset( 577 backend_ds, 578 filename_or_obj, (...) 588 **kwargs, 589 ) 590 return ds File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:590, in open_dataset() 569 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 570 self, 571 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 587 autoclose=False, 588 ) -> Dataset: 589 filename_or_obj = _normalize_path(filename_or_obj) --> 590 store = NetCDF4DataStore.open( 591 filename_or_obj, 592 mode=mode, 593 format=format, 594 group=group, 595 clobber=clobber, 596 diskless=diskless, 597 persist=persist, 598 lock=lock, 599 autoclose=autoclose, 600 ) 602 store_entrypoint = StoreBackendEntrypoint() 603 with close_on_error(store): File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:391, in open() 385 kwargs = dict( 386 clobber=clobber, diskless=diskless, persist=persist, format=format 387 ) 388 manager = CachingFileManager( 389 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 390 ) --> 391 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:338, in init() 336 self._group = group 337 self._mode = mode --> 338 self.format = self.ds.data_model 339 self._filename = self.ds.filepath() 340 self.is_remote = is_remote_uri(self._filename) File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:400, in ds() 398 @property 399 def ds(self): --> 400 return self._acquire() File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:394, in _acquire() 393 def _acquire(self, needs_lock=True): --> 394 with self._manager.acquire_context(needs_lock) as root: 395 ds = _nc4_require_group(root, self._group, self._mode) 396 return ds File ~/mambaforge/envs/icec/lib/python3.10/contextlib.py:135, in enter() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args, *kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2464, in netCDF4._netCDF4.Dataset.init() File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -101] NetCDF: HDF error: '/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc' ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.1.38-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.7.0
distributed: 2023.7.0
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: 0.14.0
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7990/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
787947436 | MDU6SXNzdWU3ODc5NDc0MzY= | 4822 | h5netcdf fails to decode attribute coordinates. | yt87 40218891 | closed | 0 | 10 | 2021-01-18T06:01:40Z | 2022-03-29T13:39:46Z | 2022-03-29T13:39:45Z | NONE | What happened:
The engine What you expected to happen: It should work. Minimal Complete Verifiable Example: ```python Put your MCVE code hereimport xarray as xr ds = xr.open_dataset('/tmp/x.nc', engine='h5netcdf') ========H5 coordinates ['x y'] AttributeError Traceback (most recent call last) <ipython-input-3-481117dce7ff> in <module> 1 import xarray as xr 2 ----> 3 ds = xr.open_dataset('/tmp/x.nc', engine='h5netcdf') ~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta) 572 573 with close_on_error(store): --> 574 ds = maybe_decode_store(store, chunks) 575 576 # Ensure source filename always stored in dataset object (GH issue #2550) ~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/backends/api.py in maybe_decode_store(store, chunks) 476 drop_variables=drop_variables, 477 use_cftime=use_cftime, --> 478 decode_timedelta=decode_timedelta, 479 ) 480 ~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta) 596 drop_variables=drop_variables, 597 use_cftime=use_cftime, --> 598 decode_timedelta=decode_timedelta, 599 ) 600 ds = Dataset(vars, attrs=attrs) ~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta) 504 if "coordinates" in var_attrs: 505 coord_str = var_attrs["coordinates"] --> 506 var_coord_names = coord_str.split() 507 if all(k in variables for k in var_coord_names): 508 new_vars[k].encoding["coordinates"] = coord_str AttributeError: 'numpy.ndarray' object has no attribute 'split' ``` Anything else we need to know?: The test file was created from CDL: ``` netcdf x { dimensions: x = 1 ; y = 1 ; variables: int foo(y, x) ; string foo:coordinates = "x y" ; data: foo =
0 ;
}
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.9.12-200.fc33.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.16.2 pandas: 1.2.0 numpy: 1.19.2 scipy: 1.5.2 netCDF4: 1.4.2 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: None cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.0 distributed: 2021.01.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: None numbagg: None pint: None setuptools: 51.1.2.post20210112 pip: 20.3.3 conda: None pytest: 5.4.3 IPython: 7.19.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4822/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
863477424 | MDU6SXNzdWU4NjM0Nzc0MjQ= | 5199 | Better error message for setting encoding["units"] | yt87 40218891 | closed | 0 | 0 | 2021-04-21T05:57:14Z | 2021-05-13T18:27:13Z | 2021-05-13T18:27:13Z | NONE | What happened: Setting invalid units for time axis encoding results in an exception What you expected to happen: It should say "invalid time units", like this (see commented out line below) Minimal Complete Verifiable Example: ```python Put your MCVE code hereimport pandas as pd import xarray as xr ds = xr.Dataset(data_vars={'v': (('t',), [0,])}, coords={'t': [pd.Timestamp(2000, 1, 1)]}) ds.t.encoding['units'] = 'days since Big Bang' ds.t.encoding['units'] = 'days after 1/3/2000'ds.to_netcdf('/tmp/x.nc', mode='w') AttributeError Traceback (most recent call last) <ipython-input-16-0d97a797be64> in <module> 5 ds.t.encoding['units'] = 'days since Big Bang' 6 #ds.t.encoding['units'] = 'days after 1/3/2000' ----> 7 ds.to_netcdf('/tmp/x.nc', mode='w') ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1752 from ..backends.api import to_netcdf 1753 -> 1754 return to_netcdf( 1755 self, 1756 path, ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1066 # TODO: allow this work (setting up the file for writing array data) 1067 # to be parallelized with dask -> 1068 dump_to_store( 1069 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1070 ) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1113 variables, attrs = encoder(variables, attrs) 1114 -> 1115 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1116 1117 ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 261 writer = ArrayWriter() 262 --> 263 variables, attributes = self.encode(variables, attributes) 264 265 self.set_attributes(attributes) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/common.py in encode(self, variables, attributes) 350 # All NetCDF files get CF encoded by default, without this attempting 351 # to write times, for example, would fail. --> 352 variables, attributes = cf_encoder(variables, attributes) 353 variables = {k: self.encode_variable(v) for k, v in variables.items()} 354 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in cf_encoder(variables, attributes) 841 _update_bounds_encoding(variables) 842 --> 843 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 844 845 # Remove attrs from bounds variables (issue #2921) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in <dictcomp>(.0) 841 _update_bounds_encoding(variables) 842 --> 843 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 844 845 # Remove attrs from bounds variables (issue #2921) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in encode_cf_variable(var, needs_copy, name) 267 variables.UnsignedIntegerCoder(), 268 ]: --> 269 var = coder.encode(var, name=name) 270 271 # TODO(shoyer): convert all of these to use coders, too: ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in encode(self, variable, name) 510 variable 511 ): --> 512 (data, units, calendar) = encode_cf_datetime( 513 data, encoding.pop("units", None), encoding.pop("calendar", None) 514 ) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in encode_cf_datetime(dates, units, calendar) 448 units = infer_datetime_units(dates) 449 else: --> 450 units = _cleanup_netcdf_time_units(units) 451 452 if calendar is None: ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _cleanup_netcdf_time_units(units) 399 400 def _cleanup_netcdf_time_units(units): --> 401 delta, ref_date = _unpack_netcdf_time_units(units) 402 try: 403 units = "{} since {}".format(delta, format_timestamp(ref_date)) ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _unpack_netcdf_time_units(units) 130 131 delta_units, ref_date = [s.strip() for s in matches.groups()] --> 132 ref_date = _ensure_padded_year(ref_date) 133 134 return delta_units, ref_date ~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _ensure_padded_year(ref_date) 105 # appropriately 106 matches_start_digits = re.match(r"(\d+)(.*)", ref_date) --> 107 ref_year, everything_else = [s for s in matches_start_digits.groups()] 108 ref_date_padded = "{:04d}{}".format(int(ref_year), everything_else) 109 AttributeError: 'NoneType' object has no attribute 'groups' ``` Anything else we need to know?: Are there detail specifications for the valid units string? Setting it to 1/3/2000 surprised me (my locale: LANG=en_CA.UTF-8). Environment: Latest code from github: xarray version 0.0.0 Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.11-200.fc33.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.0.0 pandas: 1.2.4 numpy: 1.20.2 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: 2021.04.0 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5199/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
849751721 | MDU6SXNzdWU4NDk3NTE3MjE= | 5106 | to_zarr() fails on time coordinate in append mode | yt87 40218891 | closed | 0 | 4 | 2021-04-03T22:26:11Z | 2021-04-20T12:04:06Z | 2021-04-20T04:41:07Z | NONE | What happened: When the append dimension coordinates are times and the dimension of the first dataset written is 1, consecutive appends forget the hour part of the coordinate. What you expected to happen: The time coordinate should be set correctly. Minimal Complete Verifiable Example: ``` import pandas as pd import xarray as xr reftime = [pd.Timestamp(2021, 2, 21, 0)] x = [0] dims = ('reftime', 'x') d = np.array([['A']]) ds1 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x}) _ = ds1.to_zarr('foo', mode='w') reftime = [pd.Timestamp(2021, 2, 21, 6)] d = np.array([['C']]) ds2 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x}) _ = ds2.to_zarr('foo', append_dim='reftime') ds = xr.open_dataset('foo', engine='zarr') ds.coords['reftime'].values array(['2021-02-21T00:00:00.000000000', '2021-02-21T00:00:00.000000000'], # should be 2021-02-21T06:00:00.000000000 dtype='datetime64[ns]') ``` Anything else we need to know?:
When the array(['2021-02-21T00:00:00.000000000', '2021-02-21T03:00:00.000000000',
'2021-02-21T06:00:00.000000000'], dtype='datetime64[ns]')
array(['2021-02-21T00:00:00.000000000', '2021-02-22T00:00:00.000000000'], dtype='datetime64[ns]') ``` Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.11-200.fc33.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: 0.10.0 h5py: 3.1.0 Nio: None zarr: 2.7.0 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2021.04.0 distributed: 2021.04.0 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5106/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
429914958 | MDU6SXNzdWU0Mjk5MTQ5NTg= | 2871 | xr.open_dataset(f1).to_netcdf(file2) is not idempotent | yt87 40218891 | closed | 0 | 5 | 2019-04-05T20:06:35Z | 2019-06-12T15:32:27Z | 2019-06-12T15:32:27Z | NONE | Here is the original (much truncated) file. ```
// global attributes: :source = "ak-obs" ; data: tmpk =
26915, 27755, -9999, 27705,
25595, -9999, 28315, -9999 ;
}
<xarray.Dataset> Dimensions: (npts: 2, ntimes: 4) Dimensions without coordinates: npts, ntimes Data variables: tmpk (npts, ntimes) float32 ... Attributes: source: ak-obs <xarray.DataArray 'tmpk' (npts: 2, ntimes: 4)>
array([[269.15, 277.55, nan, 277.05],
[255.95, nan, 283.15, nan]], dtype=float32)
Dimensions without coordinates: npts, ntimes
Attributes:
description: 2 m Temperature - closest to top of hour
units: K
level: 2 m
period_variable: ntimes1
// global attributes: :source = "ak-obs" ; data: tmpk =
26915, 27755, 0, 27705,
25595, 0, 28315, 0 ;
}
xr.show_versions() INSTALLED VERSIONS commit: None python: 3.7.1 | packaged by conda-forge | (default, Feb 18 2019, 01:42:00) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.5.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.5.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.4 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: 4.6.10 pytest: 4.3.1 IPython: 7.4.0 sphinx: 1.8.5 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2871/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);