id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 955043280,MDU6SXNzdWU5NTUwNDMyODA=,5644,`polyfit` with weights alters the DataArray in place,8291800,closed,0,,,6,2021-07-28T16:43:17Z,2023-06-09T15:38:01Z,2023-06-09T15:38:01Z,CONTRIBUTOR,,,,"**What happened**: After running `da.polyfit` on a DataArray with weights, the data has been overwritten **What you expected to happen**: I didn't see this documented anywhere, but I did not expect that creating a polyfit dataset would clobber the original data that I'm fitting to. The data isn't altered in the case of unweighted fitting, only weighted. **Minimal Complete Verifiable Example**: ```python In [2]: import xarray as xr; import numpy as np In [3]: nz, ny, nx = (10, 20, 30) In [4]: da = xr.DataArray(np.random.rand(nz, ny ,nz), dims=['z','y','x']) In [6]: da.mean(), da.max() Out[6]: ( array(0.4963857), array(0.99996494)) In [7]: pf = da.polyfit(""z"", deg=2) # This will not alter the data In [9]: da.mean(), da.max() Out[9]: ( array(0.4963857), array(0.99996494)) # Non-zero `w` argument alters the data In [11]: pf = da.polyfit(""z"", deg=2, w=np.arange(nz)) In [12]: da.mean(), da.max() Out[12]: ( array(2.24317611), array(8.95963569)) ``` **Anything else we need to know?**: I assume it's happening here https://github.com/pydata/xarray/blob/da99a5664df4f5013c2f6b0e758394bec5e0bc80/xarray/core/dataset.py#L6805 My question is whether this is supposed to be the case to avoid copies? Or if it's accidental? **Environment**:
Output of xr.show_versions() xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56) [Clang 10.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.2 numpy: 1.20.2 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.2.1 Nio: None zarr: 2.6.1 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0 distributed: 2.20.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20200814 pip: 21.1.2 conda: 4.8.4 pytest: 6.2.4 IPython: 7.18.1 sphinx: 3.5.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5644/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 970619131,MDU6SXNzdWU5NzA2MTkxMzE=,5706,Loading datasets of numpy string arrays leads to error and/or segfault,8291800,closed,0,,,8,2021-08-13T18:17:33Z,2023-05-12T08:06:07Z,2023-05-12T08:06:06Z,CONTRIBUTOR,,,," **What happened**: Numpy arrays of strings that are saved with h5py cause errors and segfaults, not always the same result. **What you expected to happen**: This works fine with `engine='h5netcdf'`: ```python In [3]: ds = xr.load_dataset(""test_str_list.h5"", engine='h5netcdf', phony_dims='sort') ``` but will consistently have a segfault with `engine='netcdf4'`. I'm assuming this is a netcdf backend issue, but thought I'd raise it here since xarray was how I discovered it. **Minimal Complete Verifiable Example**: ```python import h5py import xarray as xr with h5py.File(""test_str_list.h5"", ""w"") as hf: hf[""pairs""] = np.array([[""20200101"", ""20200201""], [""20200101"", ""20200301""]]).astype(""S"") ds = xr.load_dataset(""test_str_list.h5"") *** Error in `/home/scott/miniconda3/envs/mapping/bin/python': munmap_chunk(): invalid pointer: 0x0000559c40956070 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x7f7c4)[0x7f4a9a6bb7c4] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f4a7a93c3ef] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__untag_entry+0xc6)[0x7f4a7a854836] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__flush_single_entry+0x275)[0x7f4a7a846085] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(+0x80de3)[0x7f4a7a846de3] ... (few thousand line backtrace) ``` **Anything else we need to know?**: Even stranger, it doesn't seem to be deterministic. After the crash, I tried the same load_dataset: ```python In [2]: ds = xr.load_dataset(""test_str_list.h5"") --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) in ----> 1 ds = xr.load_dataset(""test_str_list.h5"") ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/api.py in load_dataset(filename_or_obj, **kwargs) 242 243 with open_dataset(filename_or_obj, **kwargs) as ds: --> 244 return ds.load() 245 246 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs) 871 for k, v in self.variables.items(): 872 if k not in lazy_data: --> 873 v.load() 874 875 return self ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/variable.py in load(self, **kwargs) 449 self._data = as_compatible_data(self._data.compute(**kwargs)) 450 elif not is_duck_array(self._data): --> 451 self._data = np.asarray(self._data) 452 return self 453 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """""" ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 546 547 def __array__(self, dtype=None): --> 548 self._ensure_cached() 549 return np.asarray(self.array, dtype=dtype) 550 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self) 543 def _ensure_cached(self): 544 if not isinstance(self.array, NumpyIndexingAdapter): --> 545 self.array = NumpyIndexingAdapter(np.asarray(self.array)) 546 547 def __array__(self, dtype=None): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """""" ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 516 517 def __array__(self, dtype=None): --> 518 return np.asarray(self.array, dtype=dtype) 519 520 def __getitem__(self, key): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """""" ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 417 def __array__(self, dtype=None): 418 array = as_indexable(self.array) --> 419 return np.asarray(array[self.key], dtype=None) 420 421 def transpose(self, order): ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key) 89 90 def __getitem__(self, key): ---> 91 return indexing.explicit_indexing_adapter( 92 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 93 ) ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 708 """""" 709 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 710 result = raw_indexing_method(raw_key.tuple) 711 if numpy_indices.tuple: 712 # index the loaded np.ndarray ~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key) 102 with self.datastore.lock: 103 original_array = self.get_array(needs_lock=False) --> 104 array = getitem(original_array, key) 105 except IndexError: 106 # Catch IndexError in netCDF4 and return a more informative netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position 0: invalid continuation byte ``` But then immediately after, another segault ```python In [4]: ds = xr.load_dataset(""test_str_list.h5"", engine='netcdf4') *** Error in `/home/scott/miniconda3/envs/mapping/bin/python': corrupted size vs. prev_size: 0x000055f97e7194a0 *** ======= Backtrace: ========= ```
Beginning of segfault stack trace, but goes on ``` ======= Backtrace: ========= /lib64/libc.so.6(+0x7f7c4)[0x7f1ba11a87c4] /lib64/libc.so.6(+0x818bb)[0x7f1ba11aa8bb] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f1b8142d3ef] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5S_close+0x84)[0x7f1b814a69a4] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_ref+0x77)[0x7f1b8141a407] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_app_ref+0x29)[0x7f1b8141a4d9] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5Sclose+0x73)[0x7f1b814a7023] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vars+0x5ad)[0x7f1b7bbc46ad] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vara+0x12)[0x7f1b7bbc4e62] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC_get_vara+0x6f)[0x7f1b7bb6b5df] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(nc_get_vara+0x8b)[0x7f1b7bb6c35b] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0xccf21)[0x7f1b4d0daf21] /home/scott/miniconda3/envs/mapping/bin/python(+0x13a77e)[0x55f97aeca77e] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x224fd)[0x7f1b4d0304fd] /home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x559d9)[0x7f1b4d0639d9] /home/scott/miniconda3/envs/mapping/bin/python(PyObject_GetItem+0x48)[0x55f97af10aa8] /home/scott/miniconda3/envs/mapping/bin/python(+0x139acd)[0x55f97aec9acd] ```
**Environment**:
Output of xr.show_versions() In [1]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.4.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.0 numpy: 1.19.2 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.8.3 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2021.01.0 distributed: 2.20.0 matplotlib: 3.3.1 cartopy: 0.17.0 seaborn: None numbagg: None pint: 0.17 setuptools: 50.3.2 pip: 21.1.3 conda: 4.8.4 pytest: None IPython: 7.18.1 sphinx: 4.0.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5706/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1056881922,I_kwDOAMm_X84-_r0C,6000,Parallel access to DataArray within `with` statement causes `BlockingIOError`,8291800,open,0,,,2,2021-11-18T03:06:26Z,2022-01-13T03:08:02Z,,CONTRIBUTOR,,,," **What happened**: My general usage is 1. Read one DataArray from an existing dataset within a `with` statement, so the file closes at the end 2. Run it through some functions, sometimes in parallel 3. after closing the dataset, append to the dataset in a new DataArray with `.to_netcdf(existing_file, engine=""h5netcdf"")`. With the setup below, I get `BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')` It's entirely possible (and likely) that it's an issue with some other library that I'm using/that xarray is using... but I thought I someone might have an idea why a very similar versions of the same script succeeds, while the first one fails. **What you expected to happen**: No error, which happens for the 2nd version **Minimal Complete Verifiable Example**: For this version, I'm using [pymp](https://github.com/classner/pymp), which I'd rather not include in the MCVE, but i've had similar issues jsut using the python multiprocessing. I just wanted to post this one first. ```python import xarray as xr import numpy as np import pymp def dummy_function_parallel(stack): out = np.zeros(stack.shape, dtype=np.float32) # Also fails: # out = pymp.shared.array(stack.shape, dtype=np.float32) with pymp.Parallel(4) as p: for i in p.range(3): out[:, i] = stack[:, i] * 3 return out # Example of a fucntion that *doesn't* cause a failure ever def dummy_function2(stack): return 2 * stack if __name__ == ""__main__"": x, y, z = np.arange(3), np.arange(3), np.arange(3) data = np.random.rand(3, 3, 3) da = xr.DataArray(data, dims=[""z"", ""y"", ""x""], coords={""x"": x, ""y"": y, ""z"": z}) da.to_dataset(name=""testdata"").to_netcdf(""testdata.nc"", engine=""h5netcdf"") with xr.open_dataset(""testdata.nc"") as ds: da = ds[""testdata""] newstack = dummy_function_parallel(da.values) # This function does work without the parallel stuff # newstack = dummy_function2(da.values) da_new = xr.DataArray(newstack, coords=da.coords, dims=da.dims) da_new.to_dataset(name=""new_testdata"").to_netcdf(""testdata.nc"", engine=""h5netcdf"") ``` Running this causes the following traceback ```python-traceback Traceback (most recent call last): File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/file_manager.py"", line 199, in _acquire_with_cache_info file = self._cache[self._key] File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/lru_cache.py"", line 53, in __getitem__ value = self._cache[key] KeyError: [, ('/data4/scott/path85/stitched/top_strip/igrams/testdata.nc',), 'a', (('decode_vlen_strings', True), ('invalid_netcdf', None))] During handling of the above exception, another exception occurred: Traceback (most recent call last): File ""/home/scott/repos/insar/insar/testxr.py"", line 46, in da_new.to_dataset(name=""new_testdata"").to_netcdf(""testdata.nc"", engine=""h5netcdf"") File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/dataset.py"", line 1900, in to_netcdf return to_netcdf( File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/api.py"", line 1060, in to_netcdf store = store_open(target, mode, format, group, **kwargs) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 178, in open return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 123, in __init__ self._filename = find_root_and_group(self.ds)[0].filename File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 189, in ds return self._acquire() File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 181, in _acquire with self._manager.acquire_context(needs_lock) as root: File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/contextlib.py"", line 113, in __enter__ return next(self.gen) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/file_manager.py"", line 187, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/file_manager.py"", line 205, in _acquire_with_cache_info file = self._opener(*self._args, **kwargs) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5netcdf/core.py"", line 712, in __init__ self._h5file = h5py.File(path, mode, **kwargs) File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/_hl/files.py"", line 442, in __init__ fid = make_fid(name, mode, userblock_size, File ""/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/_hl/files.py"", line 201, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File ""h5py/_objects.pyx"", line 54, in h5py._objects.with_phil.wrapper File ""h5py/_objects.pyx"", line 55, in h5py._objects.with_phil.wrapper File ""h5py/h5f.pyx"", line 116, in h5py.h5f.create BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable') ``` **Anything else we need to know?**: The weird part to me: If I change the end of the script so that the function runs **after** the `with` statement exits (so I'm passing the DataArray reference from a closed dataset), there's never an error. Indeed, that's how I fixed this for my real, longer script. ```python with xr.open_dataset(""testdata.nc"") as ds: da = ds[""testdata""] # Now after it closed, this parallel function doesn't cause the hangup newstack = dummy_function_parallel(da.values) da_new = xr.DataArray(newstack, coords=da.coords, dims=da.dims) da_new.to_dataset(name=""new_testdata"").to_netcdf(""testdata.nc"", engine=""h5netcdf"") ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.4.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.0 numpy: 1.19.2 scipy: 1.5.3 netCDF4: 1.5.4 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.8.3 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.6 cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2021.01.0 distributed: 2.20.0 matplotlib: 3.3.1 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: 0.17 setuptools: 50.3.2 pip: 21.2.4 conda: 4.8.4 pytest: 6.2.4 IPython: 7.18.1 sphinx: 4.0.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6000/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue