id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2123107474,PR_kwDOAMm_X85mRC1-,8717,Add lru_cache to named_array.utils.module_available and core.utils.module_available,32731672,closed,0,,,1,2024-02-07T14:01:35Z,2024-02-26T11:23:04Z,2024-02-07T16:26:12Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8717,"Our application creates many small netcdf3 files: https://github.com/equinor/ert/blob/9c2b60099a54eeb5bb40013acef721e30558a86c/src/ert/storage/local_ensemble.py#L593 . A significant time in xarray.backends.common.py:AbstractWriteableDataStore.set_variables is spent on common.py:is_dask_collection as it checks for the presence of the module dask which takes about 0.3 ms. This time becomes significant in the case of many small files. This PR uses lru_cache to avoid rechecking for the presence of dask as it should not change for the lifetime of the application. In one stress test we called dataset.py:2201(to_netcdf) 13634 times which took 82.27 seconds, of which 46.8 seconds was spent on utils.py:1162(module_available). With the change in this PR, the same test spends only 50s on to_netcdf . Generally, under normal load, a session in our application will call to_netcdf ~1000 times, but 10 000 happens. - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8717/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2121571453,PR_kwDOAMm_X85mL5Gy,8716,Add lru_cache to module_available,32731672,closed,0,,,9,2024-02-06T20:00:19Z,2024-02-07T14:50:19Z,2024-02-07T14:50:19Z,CONTRIBUTOR,,0,pydata/xarray/pulls/8716,"Our application creates many small netcdf3 files: https://github.com/equinor/ert/blob/9c2b60099a54eeb5bb40013acef721e30558a86c/src/ert/storage/local_ensemble.py#L593 . A significant time in xarray.backends.common.py:AbstractWriteableDataStore.set_variables is spent on common.py:is_dask_collection as it checks for the presence of the module dask which takes about 0.3 ms. This time becomes significant in the case of many small files. This PR uses lru_cache to avoid rechecking for the presence of dask as it should not change for the lifetime of the application. - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8716/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2112742578,I_kwDOAMm_X8597eSy,8693,reading netcdf with engine=scipy fails with a typeerror under certain conditions,32731672,open,0,,,4,2024-02-01T15:03:23Z,2024-02-05T09:35:51Z,,CONTRIBUTOR,,,,"### What happened? Saving and loading from netcdf with engine=scipy produces an unexpected valueerror on read. The file seems to be corrupted. ### What did you expect to happen? reading works just fine. ### Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr ds = xr.Dataset( { ""values"": ( [""name"", ""time""], np.array([[]], dtype=np.float32).T, ) }, coords={""time"": [1], ""name"": []}, ).expand_dims({""index"": [0]}) ds.to_netcdf(""file.nc"", engine=""scipy"") _ = xr.open_dataset(""file.nc"", engine=""scipy"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python KeyError Traceback (most recent call last) File .../python3.11/site-packages/xarray/backends/file_manag er.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError: File .../python3.11/site-packages/xarray/backends/lru_cache. py:56, in LRUCache.__getitem__(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [, ('/home/eivind/Projects/ert/file.nc',), 'r', (('mmap', None), ('version', 2)), '264ec6b3-78b3-4766-bb41-7656d6a51962'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[1], line 18 4 ds = ( 5 xr.Dataset( 6 { (...) 15 .expand_dims({""index"": [0]}) 16 ) 17 ds.to_netcdf(""file.nc"", engine=""scipy"") ---> 18 _ = xr.open_dataset(""file.nc"", engine=""scipy"") File .../python3.11/site-packages/xarray/backends/api.py:572 , in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, d ecode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked _array_type, from_array_kwargs, backend_kwargs, **kwargs) 560 decoders = _resolve_decoders_kwargs( 561 decode_cf, 562 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 568 decode_coords=decode_coords, 569 ) 571 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None) --> 572 backend_ds = backend.open_dataset( 573 filename_or_obj, 574 drop_variables=drop_variables, 575 **decoders, 576 **kwargs, 577 ) 578 ds = _dataset_from_backend_dataset( 579 backend_ds, 580 filename_or_obj, (...) 590 **kwargs, 591 ) 592 return ds File .../python3.11/site-packages/xarray/backends/scipy_.py: 315, in ScipyBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, con cat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, mode, format, group, mm ap, lock) 313 store_entrypoint = StoreBackendEntrypoint() 314 with close_on_error(store): --> 315 ds = store_entrypoint.open_dataset( 316 store, 317 mask_and_scale=mask_and_scale, 318 decode_times=decode_times, 319 concat_characters=concat_characters, 320 decode_coords=decode_coords, 321 drop_variables=drop_variables, 322 use_cftime=use_cftime, 323 decode_timedelta=decode_timedelta, 324 ) 325 return ds File .../python3.11/site-packages/xarray/backends/store.py:4 3, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, conca t_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 29 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 30 self, 31 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 39 decode_timedelta=None, 40 ) -> Dataset: 41 assert isinstance(filename_or_obj, AbstractDataStore) ---> 43 vars, attrs = filename_or_obj.load() 44 encoding = filename_or_obj.get_encoding() 46 vars, attrs, coord_names = conventions.decode_cf_variables( 47 vars, 48 attrs, (...) 55 decode_timedelta=decode_timedelta, 56 ) File .../python3.11/site-packages/xarray/backends/common.py: 210, in AbstractDataStore.load(self) 188 def load(self): 189 """""" 190 This loads the variables and attributes simultaneously. 191 A centralized loading function makes it easier to create (...) 207 are requested, so care should be taken to make sure its fast. 208 """""" 209 variables = FrozenDict( --> 210 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 211 ) 212 attributes = FrozenDict(self.get_attrs()) 213 return variables, attributes File .../python3.11/site-packages/xarray/backends/scipy_.py: 181, in ScipyDataStore.get_variables(self) 179 def get_variables(self): 180 return FrozenDict( --> 181 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 182 ) File .../python3.11/site-packages/xarray/backends/scipy_.py: 170, in ScipyDataStore.ds(self) 168 @property 169 def ds(self): --> 170 return self._manager.acquire() File .../python3.11/site-packages/xarray/backends/file_manag er.py:193, in CachingFileManager.acquire(self, needs_lock) 178 def acquire(self, needs_lock=True): 179 """"""Acquire a file object from the manager. 180 181 A new file is only opened if it has expired from the (...) 191 An open file object, as returned by ``opener(*args, **kwargs)``. 192 """""" --> 193 file, _ = self._acquire_with_cache_info(needs_lock) 194 return file File .../python3.11/site-packages/xarray/backends/file_manag er.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs[""mode""] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == ""w"": 219 # ensure file doesn't get overridden when opened again 220 self._mode = ""a"" File .../python3.11/site-packages/xarray/backends/scipy_.py: 109, in _open_scipy_netcdf(filename, mode, mmap, version) 106 filename = io.BytesIO(filename) 108 try: --> 109 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 110 except TypeError as e: # netcdf3 message is obscure in this case 111 errmsg = e.args[0] File .../python3.11/site-packages/scipy/io/_netcdf.py:278, i n netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale) 275 self._attributes = {} 277 if mode in 'ra': --> 278 self._read() File .../python3.11/site-packages/scipy/io/_netcdf.py:607, i n netcdf_file._read(self) 605 self._read_dim_array() 606 self._read_gatt_array() --> 607 self._read_var_array() File .../python3.11/site-packages/scipy/io/_netcdf.py:688, i n netcdf_file._read_var_array(self) 685 data = None 686 else: # not a record variable 687 # Calculate size to avoid problems with vsize (above) --> 688 a_size = reduce(mul, shape, 1) * size 689 if self.use_mmap: 690 data = self._mm_buf[begin_:begin_+a_size].view(dtype=dtype_) TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.3 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: 1.8.0 IPython: 8.17.2 sphinx: 7.2.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8693/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2093790208,I_kwDOAMm_X858zLQA,8641,Using netcdf3 with datetime64[ns] quickly overflows int32,32731672,closed,0,,,5,2024-01-22T12:18:50Z,2024-02-05T08:54:11Z,2024-02-05T08:54:11Z,CONTRIBUTOR,,,,"### What happened? While trying to store datetimes into netcdf, ran into the problem of overflowing int32 when datetimes include nanoseconds. ### What did you expect to happen? First surprised that my data did not store successfully, but after investigating, come to understand that the netcdf3 format is quite limited. It would probably make sense to include some warning when using datetime64 when storing to netcdf3. ### Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr import datetime dataset = xr.combine_by_coords( [ xr.Dataset( {""value"": ([""step""], [0.0])}, coords={ ""step"": np.array( [datetime.datetime(2000, 1, 1, 0, 0)], dtype=""datetime64[ns]"" ), }, ), xr.Dataset( {""value"": ([""step""], [0.0])}, coords={ ""step"": np.array( [datetime.datetime(2000, 1, 1, 1, 0, 0, 1)], dtype=""datetime64[ns]"", ), }, ), ] ) dataset.to_netcdf(""./out.nc"", engine=""scipy"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [x] Complete example — the example is self-contained, including all data and the text of any traceback. - [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. - [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ``` File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path , mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2300 encoding = {} 2301 from xarray.backends.api import to_netcdf -> 2303 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2304 self, 2305 path, 2306 mode=mode, 2307 format=format, 2308 group=group, 2309 engine=engine, 2310 encoding=encoding, 2311 unlimited_dims=unlimited_dims, 2312 compute=compute, 2313 multifile=False, 2314 invalid_netcdf=invalid_netcdf, 2315 ) File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_f ile, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1311 # to avoid this mess of conditionals 1312 try: 1313 # TODO: allow this work (setting up the file for writing array data) 1314 # to be parallelized with dask -> 1315 dump_to_store( 1316 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1317 ) 1318 if autoclose: 1319 store.close() File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store , writer, encoder, encoding, unlimited_dims) 1359 if encoder: 1360 variables, attrs = encoder(variables, attrs) -> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore. store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 349 if writer is None: 350 writer = ArrayWriter() --> 352 variables, attributes = self.encode(variables, attributes) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:442, in WritableCFDataStore.encode (self, variables, attributes) 438 def encode(self, variables, attributes): 439 # All NetCDF files get CF encoded by default, without this attempting 440 # to write times, for example, would fail. 441 variables, attributes = cf_encoder(variables, attributes) --> 442 variables = {k: self.encode_variable(v) for k, v in variables.items()} 443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 444 return variables, attributes File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:442, in (.0) 438 def encode(self, variables, attributes): 439 # All NetCDF files get CF encoded by default, without this attempting 440 # to write times, for example, would fail. 441 variables, attributes = cf_encoder(variables, attributes) --> 442 variables = {k: self.encode_variable(v) for k, v in variables.items()} 443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 444 return variables, attributes File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/scipy_.py:213, in ScipyDataStore.encode_vari able(self, variable) 212 def encode_variable(self, variable): --> 213 variable = encode_nc3_variable(variable) 214 return variable File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/netcdf3.py:114, in encode_nc3_variable(var) 112 var = coder.encode(var) 113 data = _maybe_prepare_times(var) --> 114 data = coerce_nc3_dtype(data) 115 attrs = encode_nc3_attrs(var.attrs) 116 return Variable(var.dims, data, attrs, var.encoding) File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/netcdf3.py:68, in coerce_nc3_dtype(arr) 66 cast_arr = arr.astype(new_dtype) 67 if not (cast_arr == arr).all(): ---> 68 raise ValueError( 69 f""could not safely cast array from dtype {dtype} to {new_dtype}"" 70 ) 71 arr = cast_arr 72 return arr ValueError: could not safely cast array from dtype int64 to int32 ``` ### Anything else we need to know? _No response_ ### Environment
/home/eivind/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptool s is replacing distutils. warnings.warn(""Setuptools is replacing distutils."") INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.3 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: 1.6.1 IPython: 8.17.2 sphinx: 7.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8641/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue