id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2276352251,I_kwDOAMm_X86HrmD7,8994,Improving performance of open_datatree,35968931,open,0,,,4,2024-05-02T19:43:17Z,2024-05-03T15:25:33Z,,MEMBER,,,,"### What is your issue? The implementation of `open_datatree` works, but is inefficient, because it calls `open_dataset` once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330. We discussed this in the [datatree meeting](https://github.com/pydata/xarray/issues/8747), and my understanding is that concretely we need to: - [ ] Create an asv benchmark for `open_datatree`, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups. - [ ] Refactor the [`NetCDFDatastore`](https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L319) class to only create one `CachingFileManager` object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406. - [ ] Refactor `NetCDF4BackendEntrypoint.open_datatree` to use an implementation that goes through `NetCDFDatastore` without calling the top-level `xr.open_dataset` again. - [ ] Check the performance of calling `xr.open_datatree` on a netCDF file has actually improved. It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8994/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2163608564,I_kwDOAMm_X86A9gv0,8802,Error when using `apply_ufunc` with `datetime64` as output dtype,44147817,open,0,,,4,2024-03-01T15:09:57Z,2024-05-03T12:19:14Z,,CONTRIBUTOR,,,,"### What happened? When using `apply_ufunc` with `datetime64[ns]` as output dtype, code throws error about converting from specific units to generic datetime units. ### What did you expect to happen? _No response_ ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray: return time[:10] def fn(da: xr.DataArray) -> xr.DataArray: dim_out = ""time_cp"" return xr.apply_ufunc( _fn, da, da.time, input_core_dims=[[""time""], [""time""]], output_core_dims=[[dim_out]], vectorize=True, dask=""parallelized"", output_dtypes=[""datetime64[ns]""], dask_gufunc_kwargs={""allow_rechunk"": True, ""output_sizes"": {dim_out: 10},}, exclude_dims=set((""time"",)), ) da_fake = xr.DataArray(np.random.rand(5,5,5), coords=dict(x=range(5), y=range(5), time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]') )).chunk(dict(x=2,y=2)) fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas fn(da_fake).compute() # same errors as above ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[211], line 1 ----> 1 fn(da_fake).compute() File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, **kwargs) 1144 """"""Manually trigger loading of this array's data from disk or a 1145 remote source into memory and return a new array. The original is 1146 left unaltered. (...) 1160 dask.compute 1161 """""" 1162 new = self.copy(deep=False) -> 1163 return new.load(**kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, **kwargs) 1119 def load(self, **kwargs) -> Self: 1120 """"""Manually trigger loading of this array's data from disk or a 1121 remote source into memory and return this array. 1122 (...) 1135 dask.compute 1136 """""" -> 1137 ds = self._to_temp_dataset().load(**kwargs) 1138 new = self._from_temp_dataset(ds) 1139 self._variable = new._variable File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, **kwargs) 850 chunkmanager = get_chunked_array_type(*lazy_data.values()) 852 # evaluate all the chunked arrays simultaneously --> 853 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs) 855 for k, data in zip(lazy_data, evaluated_data): 856 self.variables[k].data = data File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs) 67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]: 68 from dask.array import compute ---> 70 return compute(*data, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) 625 postcomputes.append(x.__dask_postcompute__()) 627 with shorten_traceback(): --> 628 results = schedule(dsk, keys, **kwargs) 630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.__call__(self, *args, **kwargs) 2369 self._init_stage_2(*args, **kwargs) 2370 return self -> 2372 return self._call_as_normal(*args, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, *args, **kwargs) 2362 vargs = [args[_i] for _i in inds] 2363 vargs.extend([kwargs[_n] for _n in names]) -> 2365 return self._vectorize_call(func=func, args=vargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args) 2444 """"""Vectorized call to `func` over positional `args`."""""" 2445 if self.signature is not None: -> 2446 res = self._vectorize_call_with_signature(func, args) 2447 elif not args: 2448 res = func() File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args) 2502 outputs = _create_arrays(broadcast_shape, dim_sizes, 2503 output_core_dims, otypes, results) 2505 for output, result in zip(outputs, results): -> 2506 output[index] = result 2508 if outputs is None: 2509 # did not call the function even once 2510 if otypes is None: ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas ``` ### Anything else we need to know? _No response_ ### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8802/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2270275688,I_kwDOAMm_X86HUaho,8985,update `to_netcdf` docstring to list support for explicit CDF5 writes,9221710,open,0,,,4,2024-04-30T00:41:13Z,2024-04-30T20:48:46Z,,NONE,,,,"### Is your feature request related to a problem? I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command. ### Describe the solution you'd like When I write a netcdf file using: D.to_netcdf( filename ) then ask ncdump to tell me the kind of file I have, ncdump -k filename it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command: nccopy -k cdf5 filename cdf5_filename the file now works in CAM. Also, the command ncdump -k cdf5_filename returns 'cdf5'. I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command. ### Describe alternatives you've considered Writing netcdf-4 files from xarray and converting via nccopy -k cdf5 filename cdf5_filename ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8985/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1389295853,I_kwDOAMm_X85Szvjt,7099,Pass arbitrary options to sel(),4160723,open,0,,,4,2022-09-28T12:44:52Z,2024-04-30T00:44:18Z,,MEMBER,,,,"### Is your feature request related to a problem? Currently `.sel()` accepts two options `method` and `tolerance`. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes. It would be also useful for custom indexes to expose their own selection options, e.g., - index query optimization like the `dualtree` flag of [sklearn.neighbors.KDTree.query](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree.query) - k-nearest neighbors selection with the creation of a new ""k"" dimension (+ coordinate / index) with user-defined name and size. From #3223, it would be nice if we could also pass distinct options values per index. What would be a good API for that? ### Describe the solution you'd like Some ideas: A. Allow passing a tuple `(labels, options_dict)` as indexer value ```python ds.sel(x=([0, 2], {""method"": ""nearest""}), y=3) ``` B. Expose an `options` kwarg that would accept a nested dict ```python ds.sel(x=[0, 2], y=3, options={""x"": {""method"": ""nearest""}}) ``` Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great. Any other ideas? Some sort of context manager? Some `Index` specific API? ### Describe alternatives you've considered The API proposed in #3223 would look great if `method` and `tolerance` were the only accepted options, but less so for arbitrary options. ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7099/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 481761508,MDU6SXNzdWU0ODE3NjE1MDg=,3223,Feature request for multiple tolerance values when using nearest method and sel(),1117224,open,0,,,4,2019-08-16T19:53:31Z,2024-04-29T23:21:04Z,,NONE,,,," ```python import xarray as xr import numpy as np import pandas as pd # Create test data ds = xr.Dataset() ds.coords['lon'] = np.arange(-120,-60) ds.coords['lat'] = np.arange(30,50) ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30') ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time']) target_lat = [36.83] target_lon = [-110] target_time = [np.datetime64('2019-06-01')] # Nearest pulls a date too far away ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest') # Adding tolerance for lat long, but also applied to time ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5) # Ideally tolerance could accept a dictionary but currently fails ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')}) ``` #### Expected Output A dataset with nearest values to tolerances on each dim. #### Problem Description I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3223/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2259316341,I_kwDOAMm_X86Gqm51,8965,Support concurrent loading of variables,2448579,open,0,,,4,2024-04-23T16:41:24Z,2024-04-29T22:21:51Z,,MEMBER,,,,"### Is your feature request related to a problem? Today if users have to concurrently load multiple variables in a DataArray or Dataset, they *have* to use dask. It struck me that it'd be pretty easy for `.load` to gain an `executor` kwarg that accepts anything that follows the [`concurrent.futures` executor](https://docs.python.org/3/library/concurrent.futures.html) interface, and parallelize this loop. https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8965/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1250939008,I_kwDOAMm_X85Kj9CA,6646,`dim` vs `dims`,5635139,closed,0,,,4,2022-05-27T16:15:02Z,2024-04-29T18:24:56Z,2024-04-29T18:24:56Z,MEMBER,,,,"### What is your issue? I've recently been hit with this when experimenting with `xr.dot` and `xr.corr` — `xr.dot` takes `dims`, and `xr.cov` takes `dim`. Because they each take multiple arrays as positional args, kwargs are more conventional. Should we standardize on one of these?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6646/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1024011835,I_kwDOAMm_X849CS47,5857,"Incorrect results when using xarray.ufuncs.angle(..., deg=True)",1119116,closed,0,,,4,2021-10-12T16:24:11Z,2024-04-28T20:58:55Z,2024-04-28T20:58:54Z,NONE,,,," **What happened**: The `xarray.ufuncs.angle` is broken. From the help docstring one may use option `deg=True` to have the result in degrees instead of radians (which is consistent with `numpy.angle` function). Yet results show that this is not the case. Moreover specifying `deg=True` or `deg=False` leads to the same result with the values in radians. **What you expected to happen**: To have the result of `xarray.ufuncs.angle` converted to degrees when option `deg=True` is specified. **Minimal Complete Verifiable Example**: ```python # Put your MCVE code here import numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd)) D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS if not np.allclose(ds.wd, (D % 360)): print(f""Issue with angle operation: {D.values%360} instead of {ds.wd.values}"" \ + f""\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!"") D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f""Issue with angle operation: {D%360} instead of {ds.wd}"" \ + f""\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!"") D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f""Issue with angle operation: {D%360} instead of {ds.wd}"" \ + f""\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!"") ``` **Anything else we need to know?**: Though `xarray.ufuncs` has a deprecated warning stating that the numpy equivalent may be used, this is not true for `numpy.angle`. Example: ```python import numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = np.exp(1j * np.radians(ds.wd)) print(Z) print(f""Is Z an XArray? {isinstance(Z, xr.DataArray)}"") D = np.angle(ds.wd, deg=True) print(D) print(f""Is D an XArray? {isinstance(D, xr.DataArray)}"") ``` If this code is run, the result of `numpy.angle(xarray.DataArray)` is not a DataArray object, contrary to other numpy operations (for all versions of xarray I've used). Hence the `xarray.ufuncs.angle` is a great option, if it was not for the current problem. **Environment**: No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost).
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-18-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.utf8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: 4.10.3 pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5857/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2224036575,I_kwDOAMm_X86EkBrf,8905,Variable doesn't have an .expand_dims method,35968931,closed,0,,,4,2024-04-03T22:19:10Z,2024-04-28T19:54:08Z,2024-04-28T19:54:08Z,MEMBER,,,,"### Is your feature request related to a problem? `DataArray` and `Dataset` have an `.expand_dims` method, but it looks like `Variable` doesn't. ### Describe the solution you'd like Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes. ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8905/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 590630281,MDU6SXNzdWU1OTA2MzAyODE=,3921,issues discovered by the all-but-dask CI,14808389,closed,0,,,4,2020-03-30T22:08:46Z,2024-04-25T14:48:15Z,2024-02-10T02:57:34Z,MEMBER,,,,"After adding the `py38-all-but-dask` CI in #3919, it discovered a few backend issues: - `zarr`: - [x] `open_zarr` with `chunks=""auto""` always tries to chunk, even if `dask` is not available (fixed in #3919) - [x] `ZarrArrayWrapper.__getitem__` incorrectly passes the indexer's `tuple` attribute to `_arrayize_vectorized_indexer` (this only happens if `dask` is not available) (fixed in #3919) - [x] slice indexers with negative steps get transformed incorrectly if `dask` is not available https://github.com/pydata/xarray/pull/8674 - `rasterio`: - ~calling `pickle.dumps` on a `Dataset` object returned by `open_rasterio` fails because a non-serializable lock was used (if `dask` is installed, a serializable lock is used instead)~","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3921/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2243685081,I_kwDOAMm_X86Fu-rZ,8945,netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory,11130776,closed,0,,,4,2024-04-15T13:26:08Z,2024-04-23T21:49:28Z,2024-04-23T15:33:36Z,NONE,,,,"### What is your issue? Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory). Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300 fp = xr.Dataset({""fp"": ([""time"", ""lat"", ""lon""], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={""time"": pd.date_range(start=""2019-01-01T02:00:00"", periods=times, freq=""1H""), ""lat"": np.arange(nlat), ""lon"": np.arange(nlon)}) flux = xr.Dataset({""flux"": ([""time"", ""lat"", ""lon""], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={""time"": [pd.to_datetime(""2019-01-01"")], ""lat"": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), ""lon"": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)}) fp.to_netcdf(""combine_datasets_tests/fp.nc"") flux.to_netcdf(""combine_datasets_tests/flux.nc"") fp1 = xr.open_dataset(""combine_datasets_tests/fp.nc"") flux1 = xr.open_dataset(""combine_datasets_tests/flux.nc"") ``` Then ``` flux1 = flux1.reindex_like(fp1, method=""ffill"", tolerance=None) ``` takes over a minute, while ``` flux1 = flux1.load().reindex_like(fp1, method=""ffill"", tolerance=None) ``` is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this). Profiling the ""reindex without load"" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 72656 0.109 0.000 0.109 0.000 utils.py:429() 72656 0.085 0.000 0.136 0.000 utils.py:430() 72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 145318 0.048 0.000 0.115 0.000 shape_base.py:370() 2 0.045 0.023 0.046 0.023 indexing.py:1334(__getitem__) 6 0.044 0.007 0.044 0.007 numeric.py:136(ones) 145318 0.044 0.000 0.067 0.000 index_tricks.py:690(__next__) 14 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next} 1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 1 0.000 0.000 0.000 0.000 file_manager.py:226(close) ``` The `getitem` call at the top is from `xarray.backends.netCDF4_.py`, line 114. Because of the jittered coordinates in `flux`, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680. In my venv, netCDF4 was installed from a wheel with the following versions: ``` netcdf4-python version: 1.6.5 HDF5 lib version: 1.12.2 netcdf lib version: 4.9.3-development ``` This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3. I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8945/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1664193419,I_kwDOAMm_X85jMZOL,7748,diff('non existing dimension') does not raise exception,4441338,open,0,,,4,2023-04-12T09:29:58Z,2024-04-21T22:31:37Z,,NONE,,,,"### What happened? Calling xr.DataArray.diff with a non-existing dimension does not raise an exception. ### What did you expect to happen? An exception to be raised. ### Minimal Complete Verifiable Example ```Python import xarray as xr; import numpy as np; xr.DataArray(np.arange(10),dims=('a',)).diff('b') ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.0-21-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.2 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: 0.14.0 flox: 0.6.9 numpy_groupies: 0.9.20 setuptools: 67.6.0 pip: 23.0.1 conda: 23.1.0 pytest: 7.2.2 mypy: 1.1.1 IPython: 8.11.0 sphinx: 6.1.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7748/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2237228079,I_kwDOAMm_X86FWWQv,8927,"Use a neutral format to have lossless interface with JSON, scipp, Astropy, pandas",92333742,open,0,,,4,2024-04-11T08:50:34Z,2024-04-12T14:25:35Z,,NONE,,,,"### Is your feature request related to a problem? Each tool has a specific structure for processing multidimensional data with the following consequences: - interfaces dedicated to each tool, - partially processed data, - no unified representation of data structures ### Describe the solution you'd like The proposed format (see [jupyter notebook](https://nbviewer.org/github/loco-philippe/ntv-numpy/blob/main/example/example_ntv_numpy.ipynb), [github repository](https://github.com/loco-philippe/ntv-numpy/blob/main/README.md), [PyPI package](https://pypi.org/project/ntv-numpy/) ) is based on the following principles: - neutral format available for tabular or multidimensional tools (e.g. Numpy, pandas, xarray, scipp, astropy), - taking into account a wide variety of data types as defined in [NTV](https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html) format, - high interoperability: reversible (lossless round-trip) interface with tabular or multidimensional tools, - reversible and compact JSON format, - Ease of sharing and exchanging multidimensional and tabular data, ### Describe alternatives you've considered _No response_ ### Additional context https://github.com/numpy/numpy/issues/12481#issuecomment-2049179803 https://github.com/astropy/astropy/issues/16286 https://github.com/scipp/scipp/issues/3422","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8927/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1959816045,I_kwDOAMm_X8500Gtt,8368,"to_netcdf: Unexpected drop of ""units"" attribute of attached ""bounds""",15173535,open,0,,,4,2023-10-24T18:15:05Z,2024-04-09T11:11:20Z,,NONE,,,,"### What happened? When writing a Dataset to netcdf, any DataArrays that are linked as bounds through another variables attrs['bounds'] entry, have their (specifically) 'units' attribute dropped inside the written netcdf file. See example ### What did you expect to happen? Units attribute to be written to the netcdf file. ### Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr # Create a new Dataset ds = xr.Dataset() # Add the x variable, Specify 'x_bnds' as bounds, defined later. ds['x'] = xr.DataArray(np.arange(10), dims='x', attrs={'units':'m', 'bounds':'x_bnds'}) # Bounds require an extra dimension equal to number of vertices. ds['nv'] = xr.DataArray(np.r_[0, 1], dims='nv') # Add the actual bounding values for variable x. ds['x_bnds'] = xr.DataArray(np.squeeze(np.dstack([np.arange(10)-0.5, np.arange(10)+0.5])), dims=['x', 'nv'], attrs={'test':4, 'units':'m', }) print('Units is attached to the bounds in the dataset before writing', 'units' in ds['x_bnds'].attrs) # Write to netcdf file ds.to_netcdf('tmp.nc', format='netcdf4', engine='netcdf4') # Open the dataset and check x_bnds attrs. units is dropped. new = xr.open_dataset('tmp.nc') print(new['x_bnds'].attrs) # Confirm that units were never written to the file. !h5dump -d /x_bnds tmp.nc ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.3 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: 7.2.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8368/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2230680765,I_kwDOAMm_X86E9Xy9,8919,Using the xarray.Dataset.where() function takes up a lot of memory,69391863,closed,0,,,4,2024-04-08T09:15:49Z,2024-04-09T02:45:09Z,2024-04-09T02:45:08Z,NONE,,,,"### What is your issue? My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function. The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable **ds** takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal. ``` # Open this netcdf file. ds = xr.open_dataset(track) # If longitude range is [-180, 180], then convert to [0, 360]. if np.any(ds[var_lon] < 0): ds[var_lon] = ds[var_lon] % 360 # Extract data by longitude and latitude. ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) & (ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3])) # Select data by range and value of some variables. for key, value in range_select.items(): ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1])) for key, value in value_select.items(): ds = ds.where(ds[key].isin(value)) ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8919/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2228373305,I_kwDOAMm_X86E0kc5,8915,"Weird behavior of DataSet.where(... , drop=True)",22961670,closed,0,,,4,2024-04-05T16:03:05Z,2024-04-08T09:32:48Z,2024-04-08T09:32:48Z,NONE,,,,"### What happened? I work with an aircraft emission dataset that is freely available online: [emission dataset](https://zenodo.org/records/10818082) During my calculations I eventually convert the `DataSet` to a `DataFrame`. My motivation is to avoid unnecessary rows in the DataFrame. Doing some calculations my code returned unexpected results. Eventually I could narrow it down to a `DataSet.where(... , drop=True)` argument I added along the way, which introduces differences in the data. Here are two examples: **Example 1:** Along some dimensions data points vanished if `drop=True` ![grafik](https://github.com/pydata/xarray/assets/22961670/a57f6c63-4927-442f-a7a0-71e1d524a706) **Example 2:** For other dimensions (these?) data points appeared elsewhere if `drop=True` ![grafik](https://github.com/pydata/xarray/assets/22961670/6e80986a-f81f-4524-bc56-56d81ddf2bd9) ### What did you expect to happen? I expect for my calculations to return the same results, regardless of whether drop=True is active or not. ### Minimal Complete Verifiable Example ```Python !wget ""https://zenodo.org/records/10818082/files/Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc"" import matplotlib.pyplot as plt import xarray as xr nc_file = xr.open_dataset('Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc') fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lon','time')).plot.contour(x='lat',ax=axs[0]) axs[0].set_xlim(-50,90) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lon','time')).plot.contour(x='lat',ax=axs[1]) axs[1].set_xlim(-50,90) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lat','time')).plot.contour(x='lon',ax=axs[0]) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lat','time')).plot.contour(x='lon',ax=axs[1]) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'ISO8859-1') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2022.11.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: None IPython: 8.10.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8915/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2206243581,I_kwDOAMm_X86DgJr9,8876,Possible race condition when appending to an existing zarr,157591329,closed,0,,,4,2024-03-25T16:59:52Z,2024-04-03T15:23:14Z,2024-03-29T14:35:52Z,NONE,,,,"### What happened? When appending to an existing zarr along a dimension (`to_zarr(..., mode='a', append_dim=""x"" ,..)`), if the dask chunking of the dataset to append does not align with the chunking of the existing zarr, the resulting _consolidated_ zarr store may have `NaN`s instead of the actual values it is supposed to have. ### What did you expect to happen? We would expected that zarr append to have the same behaviour as if we concatenate dataset _in memory_ (using `concat`) and write the whole result on a new zarr store in one go ### Minimal Complete Verifiable Example ```Python from distributed import Client, LocalCluster import xarray as xr import tempfile ds1 = xr.Dataset({""a"": (""x"", [1., 1.])}, coords={'x': [1, 2]}).chunk({""x"": 3}) ds2 = xr.Dataset({""a"": (""x"", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({""x"": 3}) with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode=""w"") # write first dataset ds2.to_zarr(store, mode=""a"", append_dim=""x"") # append first dataset rez = xr.open_zarr(store).compute() # open consolidated dataset nb_values = rez.a.count().item(0) # count non NaN values if nb_values != 6: print(""found NaNs:"") print(rez.to_dataframe()) break ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python /tmp/tmptg_pe6ox /tmp/tmpm7ncmuxd /tmp/tmpiqcgoiw2 /tmp/tmppma1ieo7 /tmp/tmpw5vi4cf0 /tmp/tmp1rmgwju0 /tmp/tmpm6tfswzi found NaNs: a x 1 1.0 2 1.0 3 1.0 4 1.0 5 1.0 6 NaN ``` ### Anything else we need to know? The example code snippet provided here, reproduces the issue. Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs. In the example, when `ds1` is first written, since it only contains 2 values along the `x` dimension, the resulting .zarr store have the chunking: `{'x': 2}`, even though we called `.chunk({""x"": 3})`. Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is _silently_ changed made this issue harder to spot. However, when we try to append the second dataset `ds2`, that contains 4 values, the `.chunk({""x"": 3})` in the begining splits the dask array into 2 **dask chunks**, but in a way that does not align with **zarr chunks**. Zarr chunks: + chunk1 : `x: [1; 2]` + chunk2 : `x: [3; 4]` + chunk3 : `x: [5; 6]` Dask chunks for `ds2`: + chunk A: `x: [3; 4; 5]` + chunk B: `x: [6]` Both **dask** chunks A and B, are supposed to write on **zarr** chunk3 And depending on who writes first, we can end up with NaN on `x = 5` or `x = 6` instead of actual values. The issue obviously happens only when dask tasks are run in parallel. Using `safe_chunks = True` when calling `to_zarr` does not seem to help. We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?) ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: None seaborn: 0.13.2 numbagg: 0.8.1 fsspec: 2024.3.1 cupy: None pint: None sparse: None flox: 0.9.5 numpy_groupies: 0.10.2 setuptools: 69.2.0 pip: 24.0 conda: None pytest: 8.1.1 mypy: None IPython: 8.22.2 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8876/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2211106929,I_kwDOAMm_X86DytBx,8882,"to_zarr silently loses data when using append_dim, if chunks are different to zarr store",140395181,closed,0,,,4,2024-03-27T15:27:02Z,2024-03-29T14:35:51Z,2024-03-29T14:35:51Z,NONE,,,,"### What happened? When writing a chunked DataArray to an existing zarr store, appending along an existing dimension of the store, I have found that some data are not written if there are multiple array chunks to one zarr chunk. I appreciate it is probably bad practice to have different chunksizes in my DataArray and zarr_store, but I think its a realistic scenario that needs to be caught. This may be related to / the same underlying issue as #8371. Perhaps the checks mentioned in https://github.com/pydata/xarray/issues/8371#issuecomment-1814589157 are somehow getting bypassed? Using zarr's ThreadSynchronizer is the only way I have found to ensure that all the data gets written. ### What did you expect to happen? I expected that either - to_zarr would recognise the different chunk sizes, and re-chunk or wait for all the chunks to be written - or an error would be raised, given that the results result in loss of data in an unpredictable way ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np from matplotlib import pyplot as plt x_coords = np.arange(10) y_coords = np.arange(10) t_coords = np.array([np.datetime64('2020-01-01').astype('datetime64[ns]')]) data = np.ones((10,10)) for i in range(4): plt.subplot(1,4,i+1) da = xr.DataArray(data.reshape((-1,10,10)), dims = ['time','x','y'], coords = {'x':x_coords, 'y':y_coords, 'time':t_coords}, ).chunk({'x':5, 'y':5,'time':1}).rename('foo') da.to_zarr('foo.zarr', mode='w') new_time = np.array([np.datetime64('2021-01-01').astype('datetime64[ns]')]) da2 = xr.DataArray(data.reshape((-1,10,10)), dims = ['time','x','y'], coords = {'x':x_coords, 'y':y_coords, 'time':new_time}, ).chunk({'x':1, 'y':1,'time':1}).rename('foo') da2.to_zarr('foo.zarr',append_dim='time', mode='a') plt.imshow(xr.open_zarr('foo.zarr').isel(time=-1).foo.values) ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. - [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? Output from the plots above: ![image](https://github.com/pydata/xarray/assets/140395181/1982344f-7db8-4e80-a3f3-b031747cacad) ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-1041-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: 0.22.0 seaborn: 0.13.2 numbagg: None fsspec: 2024.3.1 cupy: None pint: 0.23 sparse: 0.15.1 flox: 0.9.5 numpy_groupies: 0.10.2 setuptools: 69.2.0 pip: 24.0 conda: 24.1.2 pytest: 8.1.1 mypy: None IPython: 8.22.2 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8882/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 935607748,MDU6SXNzdWU5MzU2MDc3NDg=,5563,Decoding non-utf-8 encoded strings with the h5netcdf engine,11391714,closed,0,,,4,2021-07-02T09:49:58Z,2024-03-26T15:08:41Z,2024-03-26T15:08:41Z,NONE,,,,"**What happened**: Trying to load a netCDF file-like (`io.BytesIO` object) with attribute strings in non-utf-8 encoding with the `h5netcdf` engine leads to `UnicodeDecodeError`. **What you expected to happen**: Loading the same file, albeit persisted to disk, with the `netcdf4` engine works fine, however, since the `netcdf4` engine doesnt support the file-like objects I ran into this issue. **Traceback**: Traceback (most recent call last): File """", line 1, in File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py"", line 242, in load_dataset with open_dataset(filename_or_obj, **kwargs) as ds: File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py"", line 496, in open_dataset backend_ds = backend.open_dataset( File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 384, in open_dataset ds = store_entrypoint.open_dataset( File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py"", line 22, in open_dataset vars, attrs = store.load() File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py"", line 126, in load attributes = FrozenDict(self.get_attrs()) File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 234, in get_attrs return FrozenDict(_read_attributes(self.ds)) File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 75, in _read_attributes v = maybe_decode_bytes(v) File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 63, in maybe_decode_bytes return txt.decode(""utf-8"") **Minimal Complete Verifiable Example**: ```python import xarray as xr import netCDF4 title = b'\xc3' f = netCDF4.Dataset('test.nc', 'w') f.title = title f.close() xr.load_dataset(""test.nc"", engine=""h5netcdf"") ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.0 (default, Feb 25 2021, 22:10:10) [GCC 8.4.0] python-bits: 64 OS: Linux OS-release: 4.15.0-136-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.1 pandas: 1.2.4 numpy: 1.20.3 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.0.0 pip: 21.1.3 conda: None pytest: 6.2.4 IPython: 7.25.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5563/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2117248281,I_kwDOAMm_X85-MqUZ,8704,Currently no way to create a Coordinates object without indexes for 1D variables,35968931,closed,0,,,4,2024-02-04T18:30:18Z,2024-03-26T13:50:16Z,2024-03-26T13:50:15Z,MEMBER,,,,"### What happened? The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on `main`, meaning that I think there is currently no way to create an `xr.Coordinates` object without 1D variables being coerced to indexes. This means there is no way to create a `Dataset` object without 1D variables becoming `IndexVariables` being coerced to indexes. ### What did you expect to happen? I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e. ```python xr.Coordinates({'x': ('x', uarr)}, indexes={}) ``` where `uarr` is an un-indexable array-like. ### Minimal Complete Verifiable Example ```Python class UnindexableArrayAPI: ... class UnindexableArray: """""" Presents like an N-dimensional array but doesn't support changes of any kind, nor can it be coerced into a np.ndarray or pd.Index. """""" _shape: tuple[int, ...] _dtype: np.dtype def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None: self._shape = shape self._dtype = dtype self.__array_namespace__ = UnindexableArrayAPI @property def dtype(self) -> np.dtype: return self._dtype @property def shape(self) -> tuple[int, ...]: return self._shape @property def ndim(self) -> int: return len(self.shape) @property def size(self) -> int: return np.prod(self.shape) @property def T(self) -> Self: raise NotImplementedError() def __repr__(self) -> str: return f""UnindexableArray(shape={self.shape}, dtype={self.dtype})"" def _repr_inline_(self, max_width): """""" Format to a single line with at most max_width characters. Used by xarray. """""" return self.__repr__() def __getitem__(self, key, /) -> Self: """""" Only supports extremely limited indexing. I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway. """""" from xarray.core.indexing import BasicIndexer if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim): # no-op return self else: raise NotImplementedError() def __array__(self) -> np.ndarray: raise NotImplementedError(""UnindexableArrays can't be converted into numpy arrays or pandas Index objects"") ``` ```python uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32')) xr.Variable(data=uarr, dims=['x']) # works fine xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0 ``` but in versions after that it triggers the NotImplementedError in `__array__`: ```python --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[59], line 1 ----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={}) File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes) 299 variables = {} 300 for name, data in coords.items(): --> 301 var = as_variable(data, name=name) 302 if var.dims == (name,) and indexes is None: 303 index, index_vars = create_default_index_implicit(var, list(coords)) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name) 152 raise TypeError( 153 f""Variable {name!r}: unable to convert object into a variable without an "" 154 f""explicit list of dimensions: {obj!r}"" 155 ) 157 if name is not None and name in obj.dims and obj.ndim == 1: 158 # automatically convert the Variable into an Index --> 159 obj = obj.to_index_variable() 161 return obj File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self) 570 def to_index_variable(self) -> IndexVariable: 571 """"""Return this variable as an xarray.IndexVariable"""""" --> 572 return IndexVariable( 573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True 574 ) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath) 2640 # Unlike in Variable, always eagerly load values into memory 2641 if not isinstance(self._data, PandasIndexingAdapter): -> 2642 self._data = PandasIndexingAdapter(self._data) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype) 1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None): 1479 from xarray.core.indexes import safe_cast_to_index -> 1481 self.array = safe_cast_to_index(array) 1483 if dtype is None: 1484 self._dtype = get_valid_numpy_dtype(array) File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array) 459 emit_user_level_warning( 460 ( 461 ""`pandas.Index` does not support the `float16` dtype."" (...) 465 category=DeprecationWarning, 466 ) 467 kwargs[""dtype""] = ""float64"" --> 469 index = pd.Index(np.asarray(array), **kwargs) 471 return _maybe_cast_to_cftimeindex(index) Cell In[55], line 63, in UnindexableArray.__array__(self) 62 def __array__(self) -> np.ndarray: ---> 63 raise NotImplementedError(""UnindexableArrays can't be converted into numpy arrays or pandas Index objects"") NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects ``` ### MVCE confirmation - [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [x] Complete example — the example is self-contained, including all data and the text of any traceback. - [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. - [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? Context is #8699 ### Environment Versions described above ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8704/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 957918751,MDU6SXNzdWU5NTc5MTg3NTE=,5664,Interpolation behaviour inconsistent with numpy? ,7017525,open,0,,,4,2021-08-02T08:56:28Z,2024-03-12T01:15:46Z,,NONE,,,,"Hey all, When running `dataset.interp(time=dataset.time)` fills with `np.nan` if one of the neighbor is a `np.nan` **even when interpolation is not actually needed**. Here is the sample code to reproduce the issue : ```python def test_crop_times_nan() : ds = xr.Dataset( data_vars = { ""some_variable"" : (['x', 'time'], np.array([[np.nan, 0, 1]])) }, coords = { ""time"" : np.array([0,1,2]) } ) result = ds.interp(time=ds.time) # result[""some_variable""].value == [nan, nan, 1.0] # whereas [nan, 0, 1.0] is EXPECTED xr.testing.assert_allclose(ds, result) ``` Please note that numpy does not have the same behavior : ```python >>> import numpy as np >>> np.interp([0,1,2], xp=[0,1,2], fp=[np.nan,0,1]) array([nan, 0., 1.]) ``` Is that an intended behaviour for xarray? If so, does this mean that I first have to check if an interpolation is needed instead of doing it no matter what (and use `reindex` instead of `interp` if it is not needed) ? (this will be kind of tricky if interpolation is needed for certain values and some not...) Thanks for your help ;) **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-7642-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.19.4 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: None cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.0 distributed: 2021.01.0 matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: None IPython: 7.19.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5664/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2140090923,I_kwDOAMm_X85_jzIr,8759,Passing datasets with different group hierarchy to open_mfdataset ,111437410,closed,0,,,4,2024-02-17T13:31:18Z,2024-03-03T18:43:09Z,2024-03-03T10:53:34Z,NONE,,,,"### Is your feature request related to a problem? When you want to open multiple datasets located at different nodes of group hierarchy in HDF file, you can't pass a list of group keys ( save_mfdataset offers 'groups' keyword; emphasis on the s). Add to that, the 'files' keyword argument does not accept 'datastore' as a valid input. ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered One, of course, can open_dataset each one in a loop and combine afterwards. One possible fix is to Modify the 'group' argument to accept a list the same length as paths list. Another could be changing ""paths"" keyword to accept datastore or h5py objects. Both are trivial in my opinion. Most of the code is already there in other functions (open_dataset, save_mfdataset). ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8759/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2141899767,I_kwDOAMm_X85_qsv3,8769,Errors started appearing after release v2024.02.0,7112768,closed,0,,,4,2024-02-19T09:23:16Z,2024-02-22T04:54:06Z,2024-02-22T04:54:06Z,NONE,,,,"### What happened? I started seeing errors in my CI after [latest xarray release](https://github.com/pydata/xarray/releases/tag/v2024.02.0). See, e.g., https://github.com/COSIMA/regional-mom6/actions/runs/7957078139/job/21719091616#step:7:226 After I added a [compat for xarray](https://github.com/COSIMA/regional-mom6/pull/98/commits/46ca91d2ac91ab57371f94108f18549aaa7040cf) to preclude the latest release the error went away. See: https://github.com/COSIMA/regional-mom6/actions/runs/7957192738 ### What did you expect to happen? _No response_ ### Minimal Complete Verifiable Example _No response_ ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8769/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2142982259,I_kwDOAMm_X85_u1Bz,8771,Unable to use Xarray to work on RCM Dataset with xsar and safe_rcm by umr-lops,34626942,closed,0,,,4,2024-02-19T18:58:50Z,2024-02-20T05:29:33Z,2024-02-20T05:29:33Z,NONE,,,,"### What happened? UMR-LOPS has introduced XSAR a library to work with RCM dataset. when working with the following code ``` import xsar import geoviews as gv import holoviews as hv import geoviews.feature as gf hv.extension('bokeh') path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010') meta = xsar.RcmMeta(name=path) meta.dt ``` I am encountering the following error ``` ValueError Traceback (most recent call last) in () 1 #rs2meta = xsar.RadarSat2Meta(name=path) ----> 2 meta = xsar.RcmMeta(name=path) 14 frames /usr/local/lib/python3.10/dist-packages/xsar/utils.py in wrapper(*args, **kwargs) 93 startrss = process.memory_info().rss 94 starttime = time.time() ---> 95 result = f(*args, **kwargs) 96 endtime = time.time() 97 if mem_monitor: /usr/local/lib/python3.10/dist-packages/xsar/rcm_meta.py in __init__(self, name) 32 self.dt = api.open_rcm(name.split(':')[1]) 33 else: ---> 34 self.dt = api.open_rcm(name) 35 if not name.startswith('RCM_DS:'): 36 name = 'RCM_DS:%s:' % name /usr/local/lib/python3.10/dist-packages/safe_rcm/api.py in open_rcm(url, backend_kwargs, manifest_ignores, **dataset_kwargs) 95 ) 96 ---> 97 tree = read_product(mapper, ""metadata/product.xml"") 98 99 calibration_root = ""metadata/calibration"" /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in read_product(mapper, product_path) 272 } 273 --> 274 converted = valmap( 275 lambda x: execute(**x)(decoded), 276 layout, /usr/local/lib/python3.10/dist-packages/toolz/dicttoolz.py in valmap(func, d, factory) 83 """""" 84 rv = factory() ---> 85 rv.update(zip(d.keys(), map(func, d.values()))) 86 return rv 87 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in (x) 273 274 converted = valmap( --> 275 lambda x: execute(**x)(decoded), 276 layout, 277 ) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs) 302 def __call__(self, *args, **kwargs): 303 try: --> 304 return self._partial(*args, **kwargs) 305 except TypeError as exc: 306 if self._should_curry(args, kwargs, exc): /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in execute(mapping, f, path) 29 subset = query(path, mapping) 30 ---> 31 return compose_left(f, attach_path(path=path))(subset) 32 33 /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs) 485 486 def __call__(self, *args, **kwargs): --> 487 ret = self.first(*args, **kwargs) 488 for f in self.funcs: 489 ret = f(ret) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs) 487 ret = self.first(*args, **kwargs) 488 for f in self.funcs: --> 489 ret = f(ret) 490 return ret 491 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in (obj) 126 ), 127 lambda obj: obj.set_index({""stacked"": [""pole"", ""pulse""]}), --> 128 lambda obj: obj.unstack(""stacked""), 129 ), 130 }, /usr/local/lib/python3.10/dist-packages/xarray/util/deprecation_helpers.py in inner(*args, **kwargs) 113 return func(*args[:-n_extra_args], **kwargs) 114 --> 115 return func(*args, **kwargs) 116 117 return inner /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse) 5576 ) 5577 else: -> 5578 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) 5579 return result 5580 /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in _unstack_once(self, dim, index_and_vars, fill_value, sparse) 5395 indexes = {k: v for k, v in self._indexes.items() if k != dim} 5396 -> 5397 new_indexes, clean_index = index.unstack() 5398 indexes.update(new_indexes) 5399 /usr/local/lib/python3.10/dist-packages/xarray/core/indexes.py in unstack(self) 1019 1020 if not clean_index.is_unique: -> 1021 raise ValueError( 1022 ""Cannot unstack MultiIndex containing duplicates. Make sure entries "" 1023 f""are unique, e.g., by calling ``.drop_duplicates('{self.dim}')``, "" ValueError: Cannot unstack MultiIndex containing duplicates. Make sure entries are unique, e.g., by calling ``.drop_duplicates('stacked')``, before unstacking. ``` As you can see from the last sections in the trace,the issue is with xarray/dataset.py when we unstack the dataframe. Any ideas why this is happening.The issue doesn't occur with radarsat 2 or any other dataset.So is this an xarray problem or should I raise the issue at umr-lops? ### What did you expect to happen? the error shouldn't be there,and I should be able to view the dataframe. as shown in below link https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/examples/rcm.html ### Minimal Complete Verifiable Example ```Python import xsar import geoviews as gv import holoviews as hv import geoviews.feature as gf hv.extension('bokeh') path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010') meta = xsar.RcmMeta(name=path) meta.dt ``` ### MVCE confirmation - [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
commit: None python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.1.58+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.25.2 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: 1.3.0 h5py: 3.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: 3.7.1 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.2 pip: 23.1.2 conda: None pytest: 7.4.4 mypy: None IPython: 7.34.0 sphinx: 5.0.2 /usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn(""Setuptools is replacing distutils."")
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8771/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1912094632,I_kwDOAMm_X85x-D-o,8231,xr.concat concatenates along dimensions that it wasn't asked to,35968931,open,0,,,4,2023-09-25T18:50:29Z,2024-02-14T20:30:26Z,,MEMBER,,,,"### What happened? Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists). ```python import xarray as xr ds1 = xr.Dataset( coords={ 'x_center': ('x_center', [1, 2, 3]), 'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( coords={ 'x_center': ('x_center', [4, 5, 6]), 'x_outer': ('x_outer', [4.5, 5.5, 6.5]), }, ) ``` Calling `xr.concat` on these with `dim='x_center'` happily concatenates them ```python xr.concat([ds1, ds2], dim='x_center') ``` ``` Dimensions: (x_outer: 7, x_center: 6) Coordinates: * x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 * x_center (x_center) int64 1 2 3 4 5 6 Data variables: *empty* ``` but notice that the returned result has been concatenated along *both* `x_center` and `x_outer`. ### What did you expect to happen? I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. `x_outer`). What I expected to happen was that (as by default `coords='different'`) both variables would be attempted to be concatenated along the `x_center` dimension, which would have succeeded for the `x_center` variable but failed for the `x_outer` variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens: ```python import xarray as xr ds1 = xr.Dataset( data_vars={ 'a': ('x_center', [1, 2, 3]), 'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( data_vars={ 'a': ('x_center', [4, 5, 6]), 'b': ('x_outer', [4.5, 5.5, 6.5]), }, ) ``` ```python xr.concat([ds1, ds2], dim='x_center', data_vars='different') ``` ``` ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4} ``` ### Minimal Complete Verifiable Example _No response_ ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? I was trying to create an example for which you would need the automatic combined concat/merge that happens within `xr.combine_by_coords`. ### Environment xarray `2023.8.0`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8231/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1390228572,I_kwDOAMm_X85S3TRc,7104,Duplicate values on unstack,114576287,closed,0,,,4,2022-09-29T04:16:26Z,2024-02-13T09:48:37Z,2024-02-13T09:48:37Z,NONE,,,,"### What happened? I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed. ### What did you expect to happen? A warning or error would be raised to say, ""this isn't going to work"". ### Minimal Complete Verifiable Example ```Python import datetime as dt import xarray as xr ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], dims=(""lat"", ""time""), coords={""lat"": [-60, 60], ""time"": [dt.datetime(2010, 1, d) for d in range(1, 4)]}, name=""test"", ).to_dataset() ds = ( ds.assign_coords( { ""month"": ds[""time""].dt.month, ""year"": ds[""time""].dt.year, } ) .set_index(time=[""month"", ""year""]) ) ds = ds.unstack(""time"") # the output only has 2 values, which isn't what I expected ds[""test""].data ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that... ### Environment
INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7104/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2126375172,I_kwDOAMm_X85-vekE,8726,PRs requiring approval & merging main?,5635139,closed,0,,,4,2024-02-09T02:35:58Z,2024-02-09T18:23:52Z,2024-02-09T18:21:59Z,MEMBER,,,,"### What is your issue? Sorry I haven't been on the calls at all recently (unfortunately the schedule is difficult for me). Maybe this was discussed there?  PRs now seem to require a separate approval prior to merging. Is there an upside to this? Is there any difference between those who can approve and those who can merge? Otherwise it just seems like more clicking. PRs also now seem to require merging the latest main prior to merging? I get there's some theoretical value to this, because changes can semantically conflict with each other. But it's extremely rare that this actually happens (can we point to cases?), and it limits the immediacy & throughput of PRs. If the bad outcome does ever happen, we find out quickly when main tests fail and can revert. (fwiw I wrote a few principles around this down a while ago [here](https://prql-lang.org/book/project/contributing/development.html#merges); those are much stronger than what I'm suggesting in this issue though)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8726/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2115049090,I_kwDOAMm_X85-ERaC,8694,Error while saving an altered dataset to NetCDF when loaded from a file,12544636,open,0,,,4,2024-02-02T14:18:03Z,2024-02-07T13:38:40Z,,NONE,,,,"### What happened? When attempting to save an altered Xarray dataset to a NetCDF file using the `to_netcdf` method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file. ### What did you expect to happen? The altered Xarray dataset is saved as a NetCDF file using the `to_netcdf` method. ### Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset( data_vars=dict( win_1=(""attempt"", [True, False, True, False, False, True]), win_2=(""attempt"", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=(""attempt"", [""paper"", ""paper"", ""scissors"", ""scissors"", ""paper"", ""paper""]), player_2=(""attempt"", [""rock"", ""scissors"", ""paper"", ""rock"", ""paper"", ""rock""]), ) ) ds.to_netcdf(""dataset.nc"") ds_from_file = xr.load_dataset(""dataset.nc"") ds_altered = ds_from_file.where(ds_from_file[""player_1""] == ""paper"", drop=True) ds_altered.to_netcdf(""dataset_altered.nc"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python Traceback (most recent call last): File ""example.py"", line 20, in ds_altered.to_netcdf(""dataset_altered.nc"") File "".../python3.9/site-packages/xarray/core/dataset.py"", line 2303, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File "".../python3.9/site-packages/xarray/backends/api.py"", line 1315, in to_netcdf dump_to_store( File "".../python3.9/site-packages/xarray/backends/api.py"", line 1362, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "".../python3.9/site-packages/xarray/backends/common.py"", line 356, in store self.set_variables( File "".../python3.9/site-packages/xarray/backends/common.py"", line 398, in set_variables writer.add(source, target) File "".../python3.9/site-packages/xarray/backends/common.py"", line 243, in add target[...] = source File "".../python3.9/site-packages/xarray/backends/scipy_.py"", line 78, in __setitem__ data[key] = value File "".../python3.9/site-packages/scipy/io/_netcdf.py"", line 1019, in __setitem__ self.data[index] = data ValueError: could not broadcast input array from shape (4,5) into shape (4,8) ``` ### Anything else we need to know? **Findings:** The issue is related to the encoding information of the dataset becoming invalid after filtering data with the `where` method. The `to_netcdf` method takes the available encoding information instead of considering the actual shape of the data. In the provided examples, the maximum length of strings stored in ""player_1"" and ""player_2"" is originally set to 8 characters. However, after filtering with the `where` method, the maximum length of the string becomes 5 in ""player_1"" and remains 8 in ""player_2."". But the encoding information of the variables still shows a length of 8, particularly the attribute `char_dim_name`. **Workaround:** A workaround to resolve this issue is to call the `drop_encoding` method on the dataset before saving it with `to_netcdf`. This action ensures that the encoding information is not available, and the `to_netcdf` method is forced to take the actual shapes of the data, preventing the broadcasting error. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.14 (main, Aug 24 2023, 14:01:46) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.3.1-060301-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8694/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 782440858,MDU6SXNzdWU3ODI0NDA4NTg=,4784,Opening a tiff with scale_factor/add_offset attrs then saving as zarr and opening causes a UFuncTypeError,53100696,closed,0,,,4,2021-01-08T22:45:21Z,2024-02-06T10:40:15Z,2024-02-06T10:40:14Z,NONE,,,," **What happened**: When opening a geotiff that has `scale_factor` and `add_offset` metadata and then saving it as a zarr the `scale_factor` and `add_offset` attributes are [loaded](https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/rasterio_.py#L280) and then saved as strings. When the resulting zarr is opened xarray attempts to [apply](https://github.com/pydata/xarray/blob/569a4da18229aed391886ef768132f3d6d64fb30/xarray/coding/variables.py#L245) the `scale_factor` and `add_offset` attributes, but raises an exception because they are of type ` 220 data *= scale_factor 221 if add_offset is not None: 222 data += add_offset UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1034-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.0 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.6.1 cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.8 cfgrib: None iris: None bottleneck: None dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.3 conda: None pytest: 6.2.1 IPython: 7.19.0 sphinx: None ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2112742578,I_kwDOAMm_X8597eSy,8693,reading netcdf with engine=scipy fails with a typeerror under certain conditions,32731672,open,0,,,4,2024-02-01T15:03:23Z,2024-02-05T09:35:51Z,,CONTRIBUTOR,,,,"### What happened? Saving and loading from netcdf with engine=scipy produces an unexpected valueerror on read. The file seems to be corrupted. ### What did you expect to happen? reading works just fine. ### Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr ds = xr.Dataset( { ""values"": ( [""name"", ""time""], np.array([[]], dtype=np.float32).T, ) }, coords={""time"": [1], ""name"": []}, ).expand_dims({""index"": [0]}) ds.to_netcdf(""file.nc"", engine=""scipy"") _ = xr.open_dataset(""file.nc"", engine=""scipy"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python KeyError Traceback (most recent call last) File .../python3.11/site-packages/xarray/backends/file_manag er.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError: File .../python3.11/site-packages/xarray/backends/lru_cache. py:56, in LRUCache.__getitem__(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [, ('/home/eivind/Projects/ert/file.nc',), 'r', (('mmap', None), ('version', 2)), '264ec6b3-78b3-4766-bb41-7656d6a51962'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[1], line 18 4 ds = ( 5 xr.Dataset( 6 { (...) 15 .expand_dims({""index"": [0]}) 16 ) 17 ds.to_netcdf(""file.nc"", engine=""scipy"") ---> 18 _ = xr.open_dataset(""file.nc"", engine=""scipy"") File .../python3.11/site-packages/xarray/backends/api.py:572 , in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, d ecode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked _array_type, from_array_kwargs, backend_kwargs, **kwargs) 560 decoders = _resolve_decoders_kwargs( 561 decode_cf, 562 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 568 decode_coords=decode_coords, 569 ) 571 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None) --> 572 backend_ds = backend.open_dataset( 573 filename_or_obj, 574 drop_variables=drop_variables, 575 **decoders, 576 **kwargs, 577 ) 578 ds = _dataset_from_backend_dataset( 579 backend_ds, 580 filename_or_obj, (...) 590 **kwargs, 591 ) 592 return ds File .../python3.11/site-packages/xarray/backends/scipy_.py: 315, in ScipyBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, con cat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, mode, format, group, mm ap, lock) 313 store_entrypoint = StoreBackendEntrypoint() 314 with close_on_error(store): --> 315 ds = store_entrypoint.open_dataset( 316 store, 317 mask_and_scale=mask_and_scale, 318 decode_times=decode_times, 319 concat_characters=concat_characters, 320 decode_coords=decode_coords, 321 drop_variables=drop_variables, 322 use_cftime=use_cftime, 323 decode_timedelta=decode_timedelta, 324 ) 325 return ds File .../python3.11/site-packages/xarray/backends/store.py:4 3, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, conca t_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 29 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 30 self, 31 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 39 decode_timedelta=None, 40 ) -> Dataset: 41 assert isinstance(filename_or_obj, AbstractDataStore) ---> 43 vars, attrs = filename_or_obj.load() 44 encoding = filename_or_obj.get_encoding() 46 vars, attrs, coord_names = conventions.decode_cf_variables( 47 vars, 48 attrs, (...) 55 decode_timedelta=decode_timedelta, 56 ) File .../python3.11/site-packages/xarray/backends/common.py: 210, in AbstractDataStore.load(self) 188 def load(self): 189 """""" 190 This loads the variables and attributes simultaneously. 191 A centralized loading function makes it easier to create (...) 207 are requested, so care should be taken to make sure its fast. 208 """""" 209 variables = FrozenDict( --> 210 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 211 ) 212 attributes = FrozenDict(self.get_attrs()) 213 return variables, attributes File .../python3.11/site-packages/xarray/backends/scipy_.py: 181, in ScipyDataStore.get_variables(self) 179 def get_variables(self): 180 return FrozenDict( --> 181 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 182 ) File .../python3.11/site-packages/xarray/backends/scipy_.py: 170, in ScipyDataStore.ds(self) 168 @property 169 def ds(self): --> 170 return self._manager.acquire() File .../python3.11/site-packages/xarray/backends/file_manag er.py:193, in CachingFileManager.acquire(self, needs_lock) 178 def acquire(self, needs_lock=True): 179 """"""Acquire a file object from the manager. 180 181 A new file is only opened if it has expired from the (...) 191 An open file object, as returned by ``opener(*args, **kwargs)``. 192 """""" --> 193 file, _ = self._acquire_with_cache_info(needs_lock) 194 return file File .../python3.11/site-packages/xarray/backends/file_manag er.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs[""mode""] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == ""w"": 219 # ensure file doesn't get overridden when opened again 220 self._mode = ""a"" File .../python3.11/site-packages/xarray/backends/scipy_.py: 109, in _open_scipy_netcdf(filename, mode, mmap, version) 106 filename = io.BytesIO(filename) 108 try: --> 109 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 110 except TypeError as e: # netcdf3 message is obscure in this case 111 errmsg = e.args[0] File .../python3.11/site-packages/scipy/io/_netcdf.py:278, i n netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale) 275 self._attributes = {} 277 if mode in 'ra': --> 278 self._read() File .../python3.11/site-packages/scipy/io/_netcdf.py:607, i n netcdf_file._read(self) 605 self._read_dim_array() 606 self._read_gatt_array() --> 607 self._read_var_array() File .../python3.11/site-packages/scipy/io/_netcdf.py:688, i n netcdf_file._read_var_array(self) 685 data = None 686 else: # not a record variable 687 # Calculate size to avoid problems with vsize (above) --> 688 a_size = reduce(mul, shape, 1) * size 689 if self.use_mmap: 690 data = self._mm_buf[begin_:begin_+a_size].view(dtype=dtype_) TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.3 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: 1.8.0 IPython: 8.17.2 sphinx: 7.2.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8693/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2111051033,I_kwDOAMm_X8591BUZ,8691,xarray.open_dataset with chunks={} returns a single chunk and not engine (h5netcdf) preferred chunks,15016780,closed,0,,,4,2024-01-31T22:04:02Z,2024-01-31T22:56:17Z,2024-01-31T22:56:17Z,NONE,,,,"### What happened? When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine=""h5netcdf"", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047). A notebook version of the code below includes the output: https://gist.github.com/abarciauskas-bgse/9366e04d2af09b79c9de466f6c1d3b90 ### What did you expect to happen? I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine. ### Minimal Complete Verifiable Example ```Python #!/usr/bin/env python # coding: utf-8 # This notebook looks at how xarray and h5netcdf return different chunks. import pandas as pd import h5netcdf import s3fs import xarray as xr dates = [ d.to_pydatetime().strftime('%Y%m%d') for d in pd.date_range('2023-02-01', '2023-03-01', freq='D') ] SHORT_NAME = 'MUR-JPL-L4-GLOB-v4.1' s3_fs = s3fs.S3FileSystem(anon=False) var = 'analysed_sst' def make_filename(time): base_url = f's3://podaac-ops-cumulus-protected/{SHORT_NAME}/' # example file: ""/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"" return f'{base_url}{time}090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' s3_urls = [make_filename(d) for d in dates] def print_chunk_shape(s3_url): try: # Open the dataset using xarray file = s3_fs.open(s3_url) dataset = xr.open_dataset(file, engine='h5netcdf', chunks={}) # Print chunk shapes for each variable in the dataset print(f""\nChunk shapes for {s3_url}:"") if dataset[var].chunks is not None: print(f""xarray open_dataset chunks for {var}: {dataset[var].chunks}"") else: print(f""xarray open_dataset chunks for {var}: Not chunked"") with h5netcdf.File(file, 'r') as file: dataset = file[var] # Check if the dataset is chunked if dataset.chunks: print(f""h5netcdf chunks for {var}:"", dataset.chunks) else: print(f""h5netcdf dataset is not chunked."") except Exception as e: print(f""Failed to process {s3_url}: {e}"") [print_chunk_shape(s3_url) for s3_url in s3_urls] ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [x] Complete example — the example is self-contained, including all data and the text of any traceback. - [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [x] New issue — a search of GitHub Issues suggests this is not a duplicate. - [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.10.198-187.748.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.1 libnetcdf: 4.9.2 xarray: 2023.6.0 pandas: 2.0.3 numpy: 1.24.4 scipy: 1.11.1 netCDF4: 1.6.4 pydap: installed h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.15.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.6.1 distributed: 2023.6.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.0.0 pip: 23.1.2 conda: None pytest: 7.4.0 mypy: None IPython: 8.14.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8691/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2104267494,I_kwDOAMm_X859bJLm,8677,Add rolling.rank() same as pandas,39230130,open,0,,,4,2024-01-28T17:27:21Z,2024-01-29T19:50:20Z,,NONE,,,,"### Is your feature request related to a problem? Dear xarray maintainers, I would like to express my heartfelt gratitude for the significant optimizations your xarray library has brought to my project. Xarray combines the speed of numpy with the highly customizable parameters of pandas. The extensive parameters in the ``rolling`` module have allowed me to achieve functionality similar to pandas more efficiently. I am wondering if it would be possible to incorporate a ranking method for rolling windows, including the ability to specify parameters such as ``pct``, similar to the pandas ``rolling.rank`` function. Your consideration of this feature would be greatly appreciated. Once again, thank you for your contributions! ![rolling](https://github.com/pydata/xarray/assets/39230130/544352ca-0da1-4b6e-9829-e0b50e229cc4) ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8677/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1716228662,I_kwDOAMm_X85mS5I2,7848,Compatibility with the Array API standard ,35968931,open,0,,,4,2023-05-18T20:34:43Z,2024-01-25T04:03:42Z,,MEMBER,,,,"### What is your issue? **Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other.** We've already had - #6804 - #7067 - #7847 and there will likely be many others. --- I suspect this might require changes to the standard as well as to xarray - in particular see [this list](https://github.com/data-apis/array-api/issues/187) of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ): - `np.clip` - `np.diff` - `np.pad` - `np.repeat` - ~`np.take`~ - ~`np.tile`~","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7848/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2079089277,I_kwDOAMm_X8577GJ9,8607,allow computing just a small number of variables,14808389,open,0,,,4,2024-01-12T15:21:27Z,2024-01-12T20:20:29Z,,MEMBER,,,,"### Is your feature request related to a problem? I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that. ### Describe the solution you'd like I'd imagine something like ```python ds.compute(variables=variable_names) ``` but I'm undecided on whether that's a good idea (it might make `.compute` more complex?) ### Describe alternatives you've considered So far I've been using something like ```python ds.assign_coords({k: lambda ds: ds[k].compute() for k in variable_names}) ds.pipe(lambda ds: ds.merge(ds[variable_names].compute())) ``` but both are not easy to type / understand (though having `.merge` take a callable would make this much easier). Also, the first option computes variables separately, which may not be ideal? ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2073024461,I_kwDOAMm_X857j9fN,8602,`DataArray.mean()` and `Dataset.mean()` fail with `sparse==0.15.0`,46072231,closed,0,,,4,2024-01-09T19:27:47Z,2024-01-10T14:44:57Z,2024-01-10T14:44:57Z,NONE,,,,"### What happened? The following script leads to an error: ``` import numpy as np import xarray as xr from sparse import GCXS x = np.random.negative_binomial(1, 0.5, size=(100, 100)) array = xr.DataArray(GCXS.from_numpy(x)) array.mean() ``` ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 array.mean() File ~/.../python3.11/site-packages/xarray/core/_aggregations.py:1663, in DataArrayAggregations.mean(self, dim, skipna, keep_attrs, **kwargs) 1588 def mean( 1589 self, 1590 dim: Dims = None, (...) 1594 **kwargs: Any, 1595 ) -> Self: 1596 """""" 1597 Reduce this DataArray's data by applying ``mean`` along some dimension(s). 1598 (...) 1661 array(nan) 1662 """""" -> 1663 return self.reduce( 1664 duck_array_ops.mean, 1665 dim=dim, 1666 skipna=skipna, 1667 keep_attrs=keep_attrs, 1668 **kwargs, 1669 ) File ~/.../python3.11/site-packages/xarray/core/dataarray.py:3776, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs) 3732 def reduce( 3733 self, 3734 func: Callable[..., Any], (...) 3740 **kwargs: Any, 3741 ) -> Self: 3742 """"""Reduce this array by applying `func` along some dimension(s). 3743 3744 Parameters (...) 3773 summarized data and the indicated dimension(s) removed. 3774 """""" -> 3776 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs) 3777 return self._replace_maybe_drop_dims(var) File ~/.../python3.11/site-packages/xarray/core/variable.py:1756, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs) 1749 keep_attrs_ = ( 1750 _get_keep_attrs(default=False) if keep_attrs is None else keep_attrs 1751 ) 1753 # Noe that the call order for Variable.mean is 1754 # Variable.mean -> NamedArray.mean -> Variable.reduce 1755 # -> NamedArray.reduce -> 1756 result = super().reduce( 1757 func=func, dim=dim, axis=axis, keepdims=keepdims, **kwargs 1758 ) 1760 # return Variable always to support IndexVariable 1761 return Variable( 1762 result.dims, result._data, attrs=result._attrs if keep_attrs_ else None 1763 ) File ~/.../python3.11/site-packages/xarray/namedarray/core.py:772, in NamedArray.reduce(self, func, dim, axis, keepdims, **kwargs) 770 data = func(self.data, axis=axis, **kwargs) 771 else: --> 772 data = func(self.data, **kwargs) 774 if getattr(data, ""shape"", ()) == self.shape: 775 dims = self.dims File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:637, in mean(array, axis, skipna, **kwargs) 635 return _to_pytimedelta(mean_timedeltas, unit=""us"") + offset 636 else: --> 637 return _mean(array, axis=axis, skipna=skipna, **kwargs) File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:399, in _create_nan_agg_method..f(values, axis, skipna, **kwargs) 396 kwargs.pop(""min_count"", None) 398 xp = get_array_namespace(values) --> 399 func = getattr(xp, name) 401 try: 402 with warnings.catch_warnings(): AttributeError: module 'sparse' has no attribute 'mean' ``` ### What did you expect to happen? Reproducible script runs without error with `sparse==0.14.0`. ### Minimal Complete Verifiable Example _No response_ ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-34-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: None xarray: 2023.12.0 pandas: 1.5.3 numpy: 1.24.4 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2023.12.0 distributed: 2023.12.0 matplotlib: 3.8.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.12.0 cupy: None pint: None sparse: 0.15.0 flox: None numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.18.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8602/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2041076267,I_kwDOAMm_X855qFor,8551,Make _obj_repr public,12115839,closed,0,,,4,2023-12-14T07:19:16Z,2023-12-21T16:00:52Z,2023-12-21T16:00:52Z,NONE,,,,"### What is your issue? We are using https://github.com/pydata/xarray/blob/2971994ef1dd67f44fe59e846c62b47e1e5b240b/xarray/core/formatting_html.py#L278 in the html representation of `AreaDefinitions` in https://github.com/pytroll/pyresample and don't like to import private functions. Would it be OK to make `_obj_repr` public?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8551/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 2027147099,I_kwDOAMm_X854089b,8523,"tree-reduce the combine for `open_mfdataset(..., parallel=True, combine=""nested"")`",2448579,open,0,,,4,2023-12-05T21:24:51Z,2023-12-18T19:32:39Z,,MEMBER,,,,"### Is your feature request related to a problem? When `parallel=True` and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the ""head node"" for the combine. Instead we can tree-reduce the combine ([example](https://gist.github.com/dcherian/345c81c69c3587873a89b49c949d1561)) by switching to `dask.bag` instead of `dask.delayed` and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node. 1. The downside is the dask graph is ""worse"" but perhaps that shouldn't stop us. 2. I think this is only feasible for `combine=""nested""` cc @TomNicholas ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8523/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1223031600,I_kwDOAMm_X85I5fsw,6561,Excessive memory consumption by to_dataframe(),8419421,closed,0,,,4,2022-05-02T15:33:33Z,2023-12-15T20:47:32Z,2023-12-15T20:47:32Z,NONE,,,,"### What happened? This is a reincarnation of #2534 with a reproduceable example. A 51 MB netCDF file leads to to_dataframe() requesting 23 GB. ### What did you expect to happen? I expect to_dataframe() to require much less than 23 GB of memory for this operation. ### Minimal Complete Verifiable Example ```Python import urllib.request import xarray as xr url = 'http://people.envsci.rutgers.edu/decker/Surface_METAR_20220501_0000.nc' fname = 'metar.nc' urllib.request.urlretrieve(url, filename=fname) ncdata = xr.open_dataset(fname) df = ncdata.to_dataframe() ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python Traceback (most recent call last): File ""/chariton/decker/test/bug/xarraymem.py"", line 8, in df = ncdata.to_dataframe() File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5399, in to_dataframe return self._to_dataframe(ordered_dims=ordered_dims) File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5363, in _to_dataframe data = [ File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5364, in self._variables[k].set_dims(ordered_dims).values.reshape(-1) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 23.3 GiB for an array with shape (5021, 127626) and data type |S39 ``` ### Anything else we need to know? _No response_ ### Environment
/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn(""Setuptools is replacing distutils."") INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.62.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6561/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 384002323,MDU6SXNzdWUzODQwMDIzMjM=,2570,np.clip() executes eagerly,1200058,closed,0,,,4,2018-11-24T16:25:03Z,2023-12-03T05:29:17Z,2023-12-03T05:29:17Z,NONE,,,,"#### Example: ```python x = xr.DataArray(np.random.uniform(size=[100, 100])).chunk(10) x ``` > > dask.array > Dimensions without coordinates: dim_0, dim_1 > ```python np.clip(x, 0, 0.5) ``` > > array([[0.264276, 0.32227 , 0.336396, ..., 0.110182, 0.28255 , 0.399041], > [0.5 , 0.030289, 0.5 , ..., 0.428923, 0.262249, 0.5 ], > [0.5 , 0.5 , 0.280971, ..., 0.427334, 0.026649, 0.5 ], > ..., > [0.5 , 0.5 , 0.294943, ..., 0.053143, 0.5 , 0.488239], > [0.5 , 0.341485, 0.5 , ..., 0.5 , 0.250441, 0.5 ], > [0.5 , 0.156285, 0.179123, ..., 0.5 , 0.076242, 0.319699]]) > Dimensions without coordinates: dim_0, dim_1 ```python x.clip(0, 0.5) ``` > > dask.array > Dimensions without coordinates: dim_0, dim_1 #### Problem description Using np.clip() directly calculates the result, while xr.DataArray.clip() does not.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2570/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1902108672,I_kwDOAMm_X85xX-AA,8207,Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset`,50383939,open,0,,,4,2023-09-19T02:44:02Z,2023-12-01T22:29:49Z,,NONE,,,,"### What is your issue? I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow: ```python-console In [1]: import os; import dask In [2]: import xarray as xr In [3]: from dask.distributed import Client, LocalCluster In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker In [5]: client = Client(cluster) In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE' In [7]: ds = xr.open_mfdataset('./remapped/*.nc', chunks={'COMID': 1400}, parallel=True) In [8]: ds.to_netcdf('./out2.nc') ``` And below, is the error I am getting:
Error message ```python-console In [8]: ds.to_netcdf('./out2.nc') /home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. warnings.warn( 2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0) Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None))) kwargs: {} Exception: ""RuntimeError('NetCDF: HDF error')"" --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 ds.to_netcdf('./out2.nc') File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2249 encoding = {} 2250 from xarray.backends.api import to_netcdf -> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2253 self, 2254 path, 2255 mode=mode, 2256 format=format, 2257 group=group, 2258 engine=engine, 2259 encoding=encoding, 2260 unlimited_dims=unlimited_dims, 2261 compute=compute, 2262 multifile=False, 2263 invalid_netcdf=invalid_netcdf, 2264 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1252 if multifile: 1253 return writer, store -> 1255 writes = writer.sync(compute=compute) 1257 if isinstance(target, BytesIO): 1258 store.sync() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs) 253 if chunkmanager_store_kwargs is None: 254 chunkmanager_store_kwargs = {} --> 256 delayed_store = chunkmanager.store( 257 self.sources, 258 self.targets, 259 lock=self.lock, 260 compute=compute, 261 flush=True, 262 regions=self.regions, 263 **chunkmanager_store_kwargs, 264 ) 265 self.sources = [] 266 self.targets = [] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, **kwargs) 203 def store( 204 self, 205 sources: DaskArray | Sequence[DaskArray], 206 targets: Any, 207 **kwargs, 208 ): 209 from dask.array import store --> 211 return store( 212 sources=sources, 213 targets=targets, 214 **kwargs, 215 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(***failed resolving arguments***) 1234 elif compute: 1235 store_dsk = HighLevelGraph(layers, dependencies) -> 1236 compute_as_if_collection(Array, store_dsk, map_keys, **kwargs) 1237 return None 1239 else: File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs) 367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get) 368 dsk2 = optimization_function(cls)(dsk, keys, **kwargs) --> 369 return schedule(dsk2, keys, **kwargs) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 3265 should_rejoin = False 3266 try: -> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3268 finally: 3269 for f in futures.values(): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous) 2390 local_worker = None 2392 with shorten_traceback(): -> 2393 return self.sync( 2394 self._gather, 2395 futures, 2396 errors=errors, 2397 direct=direct, 2398 local_worker=local_worker, 2399 asynchronous=asynchronous, 2400 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__() 483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray: --> 484 return np.asarray(self.get_duck_array(), dtype=dtype) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array() 486 def get_duck_array(self): --> 487 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array() 663 def get_duck_array(self): --> 664 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array() 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): --> 557 array = array.get_duck_array() 558 return _wrap_numpy_scalars(array) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array() 73 def get_duck_array(self): ---> 74 return self.func(self.array.get_duck_array()) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array() 550 def get_duck_array(self): --> 551 array = self.array[self.key] 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__() 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter() 836 """"""Support explicit indexing by delegating to a raw indexing method. 837 838 Outer and/or vectorized indexers are supported by indexing a second time (...) 855 Indexing result, in the form of a duck numpy-array. 856 """""" 857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 858 result = raw_indexing_method(raw_key.tuple) 859 if numpy_indices.tuple: 860 # index the loaded np.ndarray 861 result = NumpyIndexingAdapter(result)[numpy_indices] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem() 110 try: 111 with self.datastore.lock: --> 112 original_array = self.get_array(needs_lock=False) 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array() 90 def get_array(self, needs_lock=True): ---> 91 ds = self.datastore._acquire(needs_lock) 92 variable = ds.variables[self.variable_name] 93 variable.set_auto_maskandscale(False) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire() 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError(""generator didn't yield"") from None File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """"""Context manager for acquiring a file."""""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs[""mode""] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == ""w"": 219 # ensure file doesn't get overridden when opened again 220 self._mode = ""a"" File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars() File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```
The header of individual NetCDF ones are also in the following:
Individual NetCDF header ```console $ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc netcdf ab_models_remapped_1980-04-20-13-00-00 { dimensions: COMID = 14980 ; time = UNLIMITED ; // (24 currently) variables: int time(time) ; time:long_name = ""time"" ; time:units = ""hours since 1980-04-20 12:00:00"" ; time:calendar = ""gregorian"" ; time:standard_name = ""time"" ; time:axis = ""T"" ; double latitude(COMID) ; latitude:long_name = ""latitude"" ; latitude:units = ""degrees_north"" ; latitude:standard_name = ""latitude"" ; double longitude(COMID) ; longitude:long_name = ""longitude"" ; longitude:units = ""degrees_east"" ; longitude:standard_name = ""longitude"" ; double COMID(COMID) ; COMID:long_name = ""shape ID"" ; COMID:units = ""1"" ; double RDRS_v2.1_P_P0_SFC(time, COMID) ; RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ; RDRS_v2.1_P_P0_SFC:long_name = ""Forecast: Surface pressure"" ; RDRS_v2.1_P_P0_SFC:units = ""mb"" ; double RDRS_v2.1_P_HU_1.5m(time, COMID) ; RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_HU_1.5m:long_name = ""Forecast: Specific humidity"" ; RDRS_v2.1_P_HU_1.5m:units = ""kg kg**-1"" ; double RDRS_v2.1_P_TT_1.5m(time, COMID) ; RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_TT_1.5m:long_name = ""Forecast: Air temperature"" ; RDRS_v2.1_P_TT_1.5m:units = ""deg_C"" ; double RDRS_v2.1_P_UVC_10m(time, COMID) ; RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ; RDRS_v2.1_P_UVC_10m:long_name = ""Forecast: Wind Modulus (derived using UU and VV)"" ; RDRS_v2.1_P_UVC_10m:units = ""kts"" ; double RDRS_v2.1_A_PR0_SFC(time, COMID) ; RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ; RDRS_v2.1_A_PR0_SFC:long_name = ""Analysis: Quantity of precipitation"" ; RDRS_v2.1_A_PR0_SFC:units = ""m"" ; double RDRS_v2.1_P_FB_SFC(time, COMID) ; RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FB_SFC:long_name = ""Forecast: Downward solar flux"" ; RDRS_v2.1_P_FB_SFC:units = ""W m**-2"" ; double RDRS_v2.1_P_FI_SFC(time, COMID) ; RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FI_SFC:long_name = ""Forecast: Surface incoming infrared flux"" ; RDRS_v2.1_P_FI_SFC:units = ""W m**-2"" ; ```
I am running `xarray` and `Dask` on an HPC, so the ""modules"" I have loaded are the following: ```console module list Currently Loaded Modules: 1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math) 2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys) 3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a 4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5 5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1 ``` Any suggestion is greatly appreciated!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8207/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 2019789753,I_kwDOAMm_X854Y4u5,8499,'drop_duplicates' behaves differently when using 1 vs many coordinates for an index,6654709,open,0,,,4,2023-12-01T00:36:42Z,2023-12-01T09:55:39Z,,NONE,,,,"### What happened? I am trying to `drop_duplicates` from a DataArray based on the values of some of the coordinates, starting from a DataArray with coordinates, but no indexes. To accomplish this, I call 'DataArray.set_xindex' with the appropriate coordinate names, and then call 'drop_duplicates' on the resulting DataArray, like so:   ```python from xarray import DataArray import numpy as np test_array = DataArray( np.random.rand(5), coords=dict(x=(""sample"", [1, 2, 1, 2, 1]), y=(""sample"", [-1] * 5)), dims=""sample"", ) # output DataArray's 'sample' dimension has length 2, as expected good = test_array.set_xindex([""x"", ""y""]).drop_duplicates(""sample"") assert len(good) == 2 ``` The above functions as expected; 'good' has had its duplicates dropped, and we are left with a DataArray of length 2. However, the following does _not_ function as I would expect: ```python # All the 'y's are '-1', so we expect the same duplicates as before to be dropped, # even if we don't include the 'y' values in the index. bad = test_array.set_xindex(""x"").drop_duplicates(""sample"") # But this assert fails! 'drop_duplicates' does not drop anything assert not bad.equals(test_array) ``` ### What did you expect to happen? I expected `drop_duplicates` to drop the duplicates when I was using only a single coordinate for the index. ### Minimal Complete Verifiable Example ```Python from xarray import DataArray import numpy as np test_array = DataArray( range(5), coords=dict(x=(""sample"", [1, 2, 1, 2, 1]), y=(""sample"", [-1] * 5)), dims=""sample"", ) # output DataArray's 'sample' dimension has length 2, as expected good = test_array.set_xindex([""x"", ""y""]).drop_duplicates(""sample"") # And indeed there are only 2 elements left after dropping duplicates. assert len(good) == 2 # All the 'y's are '-1', so we expect the same duplicates as before to be dropped, bad = test_array.drop_vars(""y"").set_xindex(""x"").drop_duplicates(""sample"") # But this assert fails! 'drop_duplicates' does not drop anything assert not bad.equals(test_array.drop_vars(""y"")) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.24.4 scipy: 1.11.2 netCDF4: 1.6.3 pydap: None h5netcdf: 1.2.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None iris: None bottleneck: None dask: 2023.9.1 distributed: 2023.9.1 matplotlib: 3.7.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.9.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.1.2 pip: 23.2.1 conda: 23.7.3 pytest: 7.4.2 mypy: None IPython: 8.15.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8499/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1983891070,I_kwDOAMm_X852P8Z-,8427,Ambiguous behavior with coordinates when appending to Zarr store with append_dim,1197350,closed,0,,,4,2023-11-08T15:40:19Z,2023-12-01T03:58:56Z,2023-12-01T03:58:55Z,MEMBER,,,,"### What happened? There are two quite different scenarios covered by ""append"" with Zarr - Adding new variables to a dataset - Extending arrays along a dimensions (via `append_dim`) This issue is about what should happen when using `append_dim` with variables that _do not contain `append_dim`_. Here's the current behavior. ```python import xarray as xr import zarr ds1 = xr.DataArray( np.array([1, 2, 3]).reshape(3, 1, 1), dims=('time', 'y', 'x'), coords={'x': [1], 'y': [2]}, name=""foo"" ).to_dataset() ds2 = xr.DataArray( np.array([4, 5]).reshape(2, 1, 1), dims=('time', 'y', 'x'), coords={'x':[-1], 'y': [-2]}, name=""foo"" ).to_dataset() # how concat works: data are aligned ds_concat = xr.concat([ds1, ds2], dim=""time"") assert ds_concat.dims == {""time"": 5, ""y"": 2, ""x"": 2} # now do a Zarr append store = zarr.storage.MemoryStore() ds1.to_zarr(store, consolidated=False) # we do not check that the coordinates are aligned--just that they have the same shape and dtype ds2.to_zarr(store, append_dim=""time"", consolidated=False) ds_append = xr.open_zarr(store, consolidated=False) # coordinates data have been overwritten assert ds_append.dims == {""time"": 5, ""y"": 1, ""x"": 1} # ...with the latest values assert ds_append.x.data[0] == -1 ``` Currently, we _always write all data variables in this scenario_. That includes overwriting the coordinates every time we append. That makes appending more expensive than it needs to be. I don't think that is the behavior most users want or expect. ### What did you expect to happen? There are a couple of different options we could consider for how to handle this ""extending"" situation (with `append_dim`) 1. [current behavior] Do not attempt to align coordinates a. [current behavior] Overwrite coordinates with new data b. Keep original coordinates c. Force the user to explicitly drop the coordinates, as we do for `region` operations. 2. Attempt to align coordinates a. Fail if coordinates don't match b. Extend the arrays to replicate the behavior of `concat` We currently do 1a. **I propose to switch to 1b**. I think it is closer to what users want, and it requires less I/O. ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.10.176-157.645.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.2 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.1 distributed: 2023.10.1 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: 0.13.0 numbagg: 0.6.0 fsspec: 2023.10.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.16.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8427/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1044693438,I_kwDOAMm_X84-RMG-,5937,DataArray.dt.seconds returns incorrect value for negative `timedelta64[ns]`,2405019,closed,0,,,4,2021-11-04T12:05:24Z,2023-11-10T00:39:17Z,2023-11-10T00:39:17Z,CONTRIBUTOR,,,,"**What happened**: For a negative `timedelta64[ns]` of 42 nanoseconds `DataArray.dt.seconds` returned a non-zero value (the returned value was `86399`). When I pass in a positive 42 nanosecond `timedelta64[ns]` with the the TimeDeltaAccessor correctly returns zero. I would have expected both assertions in the example below to have passed, but the second fails. This seems to be a general issue with negative `timedelta64[ns]`. ```bash array([0]) Dimensions without coordinates: dim_0 array([86399]) Dimensions without coordinates: dim_0 Traceback (most recent call last): File ""bug_dt_seconds.py"", line 15, in assert da.dt.seconds == 0 AssertionError ``` **What you expected to happen**: ```bash array([0]) Dimensions without coordinates: dim_0 array([0]) Dimensions without coordinates: dim_0 ``` **Minimal Complete Verifiable Example**: ```python # coding: utf-8 import xarray as xr import numpy as np # number of nanoseconds value = 42 da = xr.DataArray([np.timedelta64(value, ""ns"")]) print(da.dt.seconds) assert da.dt.seconds == 0 da = xr.DataArray([np.timedelta64(-value, ""ns"")]) print(da.dt.seconds) assert da.dt.seconds == 0 ``` **Anything else we need to know?**: I've narrowed this down to the call to `pd.Series(values.ravel())` in `xarray.core.accessor_dt._access_through_series`: ```python ipdb> pd.Series(values.ravel()) 0 -1 days +23:59:59.999999958 dtype: timedelta64[ns] ``` I think the issue arises because pandas turns the numpy timedelta64 into a ""minus one day plus a time"". This actually does have a number of ""seconds"" in it, but the ""total_seconds"" has the expected value: ```python ipdb> pd.Series(values.ravel()).dt.total_seconds() 0 -4.200000e-08 dtype: float64 ``` Which would correctly round to zero. I don't think the issue is in pandas, although the output from pandas is counter-intuitive: ```python ipdb> pd.Series(values.ravel()).dt.seconds 0 86399 dtype: int64 ``` Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)? **Environment**:
Output of xr.show_versions() ``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 6 2020, 04:59:01) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.18.2 pandas: 1.3.4 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.4.2 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.09.1 distributed: 2021.09.1 matplotlib: 3.2.2 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None fsspec: 2021.06.1 cupy: None pint: 0.18 sparse: None setuptools: 46.4.0.post20200518 pip: 21.1.2 conda: None pytest: 6.0.1 IPython: 7.16.1 sphinx: None ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5937/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1981799811,I_kwDOAMm_X852H92D,8423,Support remote string paths for `h5netcdf` engine,11656932,open,0,,,4,2023-11-07T16:52:18Z,2023-11-09T07:24:45Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem? Currently the `h5netcdf` engine supports opening remote files, but only already open file-like objects (e.g. `s3fs.open(...)`), not string paths like `s3://...`. There are situations where I'd like to use string paths instead of open file-like objets - Opening files can sometimes be slow (xref https://github.com/fsspec/s3fs/issues/816) - When using `parallel=True` for opening lots of files, serializing open file-like objects back and forth from a remote cluster can be slow - Some systems (e.g. NASA Earthdata) only hand out credentials that are valid when run in the same region as the data. Being able to use `parallel=True` + `storage_options` would be convenient/performant in that case. ### Describe the solution you'd like It would be nice if I could do something like the following: ```python ds = xr.open_mfdataset( files, # A bunch of files like `s3://bucket/file` engine=""h5netcdf"", ... parallel=True, storage_options={...}, # fsspec-compatible options ) ``` and have my files opened prior to handing off to `h5netcdf`. `storage_options` is already supported for Zarr, so hopefully extending to `h5netcdf` feels natural. ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8423/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1975845455,I_kwDOAMm_X851xQJP,8410,Segmentation fault 139 (SIGSEGV),39524075,closed,0,,,4,2023-11-03T10:14:03Z,2023-11-06T20:34:46Z,2023-11-06T20:34:45Z,NONE,,,,"### What happened? While opening a set of netCDF files in a for loop, using xr.open_dataset().load(), I get a segmentation error (nr. 139). Please see code example below: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)).load() [other code working on region_pred...] ``` The error is shown in Linux/Mac after running my Python code, whereas Windows seems to be masking it. I was able to catch that on Windows by launching my code as: ``` python3 my_code.py && echo ok || echo KO ``` In this way, KO gets printed and the segmentation fault is now noticeable. I managed to fix the issue by using a second variable (called reg_pred) in addition to region_pred: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)) reg_pred = region_pred.load() [other code working on reg_pred...] ``` ### What did you expect to happen? I don't know if the issue I described is something that the developers made on purpose. Personally, I think it is an issue and that's why I am reporting it. If it is not an issue, I would like to get a clarification in order to understand what am I missing. Thank you in advance. ### Minimal Complete Verifiable Example ```Python for region in region_list: with storage_client.open(region, ""rb"") as f: data = f.read() region_pred = xr.open_dataset(io.BytesIO(data)).load() # some code working on region_pred to compute weather indices... ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: ('Italian_Italy', '1252') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.8.0 pandas: 2.1.0 numpy: 1.26.0 scipy: 1.11.2 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: 2023.9.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.15.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8410/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1977485456,I_kwDOAMm_X8513giQ,8413,Add a perception of a __xarray__ magic method ,6273919,open,0,,,4,2023-11-04T19:55:14Z,2023-11-05T18:50:14Z,,NONE,,,,"### Is your feature request related to a problem? I am often moving data from external objects (of all sorts!) into xarray. This is a common use case Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into ### Describe the solution you'd like So here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best. # Magic Methods It would be great to see these magic method signatures become integrated throughout the library: ``` ___xarray__ -> xr.Dataset | xr.DataArray ___xarray_array__ -> xr.DatArray ___xarray_dataset__ -> xr.Dataset ___xarray_datatree__ -> xr.DataTree # when DataTree is finally integrated into xarray ``` # Conversion Registry And these extension functions to register converters: ``` def register_xarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset | xr.DataArray]: ... def register_dataarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.DataArray: ... def register_dataset_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset: ... def register_datatree_converter(class, name: str, func : Callable[[class, ...], xr.DataArray] | None) -> DataTree # when DataTree is finally integrated into xarray ... ``` Registering a converter if if cls implements a corresponding __xarray_*__ method or another converter already registered for cls. Perhaps add an argument that specifies if the converter should or should not be added if their is a clash. Perhaps these functions return the replaced converter so it can be added back in if needed? Ideally, also, ""deregister"" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed. # User API Along with the following new user API functions: ``` def as_xarray(x, *args, **kwargs) -> xr.Dataset | xr.DataArray: ... def as_dataarray(x,*args, **kwargs) -> xr.DataArray: ... def as_dataset(x,*args, **kwargs) -> xr.DataSet: ... def as_dataset(x,*args, **kwargs) -> xr.DataSet: # when DataTree is finally integrated into xarray ... ``` ""as_xarray"" returns (in order of precedence: - x unaltered if it is an xarray objects - registered_xarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception - x.__xarray__(*args, **kwargs), if it exits, is callable, and does not throw an exception - x.__xarray_dataset__(*args, **kwargs), if it exists, is callable, and does not throw an exception - x.__xarray_dataarray__(*args, **kwargs), if it exists, is callable, and does not throw an exception - well known aliases of __xarray_dataarray__, such as x.to_xarray(*args, **kwargs) (see pandas) - [DESIGN DECISION] convert and return tuple[dims, data, [attr, encoding] to DataArray? - [DESIGN DECISION] convert and return tuple encoding of DataSet? - [DESIGN DECISION] return DataArray wrapped duck-typed array in DataArray? The rationale for putting the registered functions first is that this would enable ""as_dataarrray"" would be slimilar, but it would only call x.__xarray_dataarray__ and well known aliases. ""as_dataset"" would be slimilar, but it would only call x.__xarray_dataset__, well known aliases, and perhaps falling back to calling x.__xarray_dataarray__ and converting the return a dataset if it has a name attribute. ""as_datatree"" would be slimilar, but it would only call x.__xarray_datatree__, and perhaps falling back to calling x.__xarray_dataarray__ and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray) The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry. # Across the Xarray Library Finally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library. Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_* functions. ### Describe alternatives you've considered This can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases. Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of _repr_html_ to integrate with pandas. The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible: ``` x = xr.open_dataset(""slide.tiff"") ``` But this doesn't work. ``` t = tiffslide.TiffSlide(""slide.tiff"") x = xr.open_dataset(t) # won't work x = xr.DataArray(t) # won't work either ``` This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem. ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8413/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 887711474,MDU6SXNzdWU4ODc3MTE0NzQ=,5290,Inconclusive error messages using to_zarr with regions,5802846,closed,0,,,4,2021-05-11T15:54:39Z,2023-11-05T06:28:39Z,2023-11-05T06:28:39Z,CONTRIBUTOR,,,," **What happened**: The idea is to use a xarray dataset (stored as dummy zarr file), which is subsequently filled with the `region` argument, as explained in the documentation. Ideally, almost nothing is stored to disk upfront. It seems the current implementation is only designed to either store coordinates for the whole dataset and write them to disk or to write without coordinates. I failed to understand this from the documentation and tried to create a dataset without coordinates and fill it with a dataset subset with coordinates. It gave some inconclusive errors depending on the actual code example (see below). `ValueError: parameter 'value': expected array with shape (0,), got (10,)` or `ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo'` It might also be a bug and it should in fact be possible to add a dataset with coordinates to a dummy dataset without coordinates. Then there seems to be an issue regarding the handling of the variables during storing the region. ... or I might just have done it wrong... and I'm looking forward to suggestions. **What you expected to happen**: Either an error message telling me that that i should use coordinates during creation of the dummy dataset. Alternatively, if this is a bug and should be possible then it should just work. **Minimal Complete Verifiable Example**: ```python import dask.array import xarray as xr import numpy as np error = 1 # choose between 0 (no error), 1, 2, 3 dummies = dask.array.zeros(30, chunks=10) # chunks in coords are not taken into account while saving!? coord_x = dask.array.zeros(30, chunks=10) # or coord_x = np.zeros((30,)) if error == 0: ds = xr.Dataset({""foo"": (""x"", dummies)}, coords={""x"":coord_x}) else: ds = xr.Dataset({""foo"": (""x"", dummies)}) print(ds) path = ""./tmp/test.zarr"" ds.to_zarr(path, mode='w', compute=False, consolidated=True) # create a new dataset to be input into a region ds = xr.Dataset({""foo"": ('x', np.arange(10))},coords={""x"":np.arange(10)}) if error == 1: ds.to_zarr(path, region={""x"": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) elif error == 2: ds.to_zarr(path, region={""x"": slice(0, 10)}) ds.to_zarr(path, region={""x"": slice(10, 20)}) # ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo' elif error == 3: ds.to_zarr(path, region={""x"": slice(0, 10)}) ds = xr.Dataset({""foo"": ('x', np.arange(10))},coords={""x"":np.arange(10)}) ds.to_zarr(path, region={""x"": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) else: ds.to_zarr(path, region={""x"": slice(10, 20)}) ds = xr.open_zarr(path) print('reopen',ds['x']) ``` **Anything else we need to know?**: **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.18.0 pandas: 1.2.3 numpy: 1.19.2 scipy: 1.6.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 377356113,MDU6SXNzdWUzNzczNTYxMTM=,2542,"full_like, ones_like, zeros_like should retain subclasses",500246,closed,0,,,4,2018-11-05T11:22:49Z,2023-11-05T06:27:31Z,2023-11-05T06:27:31Z,CONTRIBUTOR,,,,"#### Code Sample, ```python # Your code here import numpy import xarray class MyDataArray(xarray.DataArray): pass da = MyDataArray(numpy.arange(5)) da2 = xarray.zeros_like(da) print(type(da), type(da2)) ``` #### Problem description I would expect that `type(da2) is type(da)`, but this is not the case. The type of `da` is always ``. Rather, the output of this script is: ``` ``` #### Expected Output I would hope as an output: ``` ``` In principle changing this could break people's code, so if a change is implemented it should probably be through an optional keyword argument to the `full_like`/`ones_like`/`zeros_like` family. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-754.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.2 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 3.0.0 cartopy: 0.16.0 seaborn: 0.9.0 setuptools: 39.2.0 pip: 18.0 conda: None pytest: 3.2.2 IPython: 6.4.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2542/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1966675016,I_kwDOAMm_X851ORRI,8388,Type annotation compatibility with numpy ufuncs,1828519,closed,0,,,4,2023-10-28T17:25:11Z,2023-11-02T12:44:50Z,2023-11-02T12:44:50Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem? I'd like mypy to understand that xarray DataArrays passed to numpy ufuncs have a return type of xarray DataArray. ```python import xarray as xr import numpy as np def compute_relative_azimuth(sat_azi: xr.DataArray, sun_azi: xr.DataArray) -> xr.DataArray: abs_diff = np.absolute(sun_azi - sat_azi) ssadiff = np.minimum(abs_diff, 360 - abs_diff) return ssadiff ``` ```bash $ mypy ./xarray_mypy.py xarray_mypy.py:7: error: Incompatible return value type (got ""ndarray[Any, dtype[Any]]"", expected ""DataArray"") [return-value] Found 1 error in 1 file (checked 1 source file) ``` ### Describe the solution you'd like I'm not sure if this is possible, if it is something xarray can fix, or something numpy needs to ""fix"". I'd like the above situation to ""just work"" without anything more than maybe some extra type-stub package. ### Describe alternatives you've considered Cast types or other type coercion or tell mypy to ignore the type issues for these numpy call. ### Additional context https://stackoverflow.com/questions/77369042/typing-when-passing-xarray-dataarray-objects-to-numpy-ufuncs","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8388/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1445905299,I_kwDOAMm_X85WLsOT,7282,groupby and mean on a MultiIndex level raises ValueError,25231875,closed,0,,,4,2022-11-11T19:15:58Z,2023-10-30T09:18:54Z,2023-08-31T03:50:33Z,NONE,,,,"### What happened? After using `set_index` to create a `MultiIndex`, calling `groupby` on a `MultiIndex` level and then `mean` raises an error. ### What did you expect to happen? Apply mean to groups, no error. ### Minimal Complete Verifiable Example ```Python d = DataArray( data=[ [0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20] ], coords={ ""greek"": (""a"", ['alpha', 'beta', 'gamma']), ""colors"": (""a"", ['red', 'green', 'blue']), ""compass"": (""b"", ['north', 'south', 'east', 'west', 'northeast', 'southeast', 'southwest']), ""integer"": (""b"", [0, 1, 2, 3, 4, 5, 6]), }, dims=(""a"", ""b"") ) d = d.set_index(a=['greek', 'colors'], b=['compass', 'integer']) g = d.groupby('greek') m = g.mean(...) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python Traceback (most recent call last): File """", line 1, in File ""/usr/local/lib/python3.10/site-packages/xarray/core/_aggregations.py"", line 5698, in mean return self.reduce( File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1201, in reduce return self.map(reduce_array, shortcut=shortcut) File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1104, in map return self._combine(applied, shortcut=shortcut) File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1136, in _combine index, index_vars = create_default_index_implicit(coord) File ""/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py"", line 1045, in create_default_index_implicit index = PandasMultiIndex(array, name) File ""/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py"", line 615, in __init__ raise ValueError( ValueError: conflicting multi-index level name 'greek' with dimension 'greek' ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 5.15.49-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.2.2 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7282/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1953059418,I_kwDOAMm_X850aVJa,8345,`.stack` produces large chunks,40218891,closed,0,,,4,2023-10-19T21:09:56Z,2023-10-26T21:20:05Z,2023-10-26T21:20:05Z,NONE,,,,"### What happened? Xarray ``stack`` does not chunk along the last coordinate, producing huge chunks, as described in #5754. Dask, seeing code like this: ``` da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"") ``` produces warning and suggestion to use context manager: ``` with dask.config.set(**{""array.slicing.split_large_chunks"": True}): da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"") ``` This fails with message ``IndexError: tuple index out of range``. ### What did you expect to happen? I expect this to work. #5754 is closed. ### Minimal Complete Verifiable Example ```Python import dask.array import numpy as np import xarray as xr var = xr.Variable( (""t"", ""z"", ""u"", ""x"", ""y""), dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)), ) da = xr.DataArray(var) def sum(ds): return ds.sum(dim=""u"") with dask.config.set(**{""array.slicing.split_large_chunks"": True}): da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"") da2 ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output ```Python --------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[21], line 5 2 return ds.sum(dim=""u"") 4 with dask.config.set(**{""array.slicing.split_large_chunks"": True}): ----> 5 da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"") 6 da2 File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse) 2795 def unstack( 2796 self, 2797 dim: Dims = None, 2798 fill_value: Any = dtypes.NA, 2799 sparse: bool = False, 2800 ) -> Self: 2801 """""" 2802 Unstack existing dimensions corresponding to MultiIndexes into 2803 multiple new dimensions. (...) 2853 DataArray.stack 2854 """""" -> 2855 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2856 return self._from_temp_dataset(ds) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse) 5498 for d in dims: 5499 if needs_full_reindex: -> 5500 result = result._unstack_full_reindex( 5501 d, stacked_indexes[d], fill_value, sparse 5502 ) 5503 else: 5504 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5393 if name not in index_vars: 5394 if dim in var.dims: -> 5395 variables[name] = var.unstack({dim: new_dim_sizes}) 5396 else: 5397 variables[name] = var File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, **dimensions_kwargs) 1928 result = self 1929 for old_dim, dims in dimensions.items(): -> 1930 result = result._unstack_once_full(dims, old_dim) 1931 return result File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim) 1817 reordered = self.transpose(*dim_order) 1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes -> 1820 new_data = reordered.data.reshape(new_shape) 1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names 1823 return type(self)( 1824 new_dims, new_data, self._attrs, self._encoding, fastpath=True 1825 ) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape) 2217 if len(shape) == 1 and not isinstance(shape[0], Number): 2218 shape = shape[0] -> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit) 283 else: 284 chunk_plan.append(""auto"") --> 285 outchunks = normalize_chunks( 286 chunk_plan, 287 shape=shape, 288 limit=limit, 289 dtype=x.dtype, 290 previous_chunks=inchunks, 291 ) 293 x2 = x.rechunk(inchunks) 295 # Construct graph File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks) 3092 chunks = tuple(""auto"" if isinstance(c, str) and c != ""auto"" else c for c in chunks) 3094 if any(c == ""auto"" for c in chunks): -> 3095 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3097 if shape is not None: 3098 chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto"" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in (.0) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto"" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): IndexError: tuple index out of range ``` ### Anything else we need to know? The most recent traceback entry point to an issue in dask code. ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.9.0 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.9.3 distributed: 2023.9.3 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: None sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8345/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1923431725,I_kwDOAMm_X85ypT0t,8264,Improve error messages,5635139,open,0,,,4,2023-10-03T06:42:57Z,2023-10-24T18:40:04Z,,MEMBER,,,,"### Is your feature request related to a problem? Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline. Some of the error messages could be _much_ more helpful. Take one example: ``` xarray.core.merge.MergeError: conflicting values for variable 'date' on objects to be combined. You can skip this check by specifying compat='override'. ``` The second sentence is nice. But the first could be give us much more information: - Which variables conflict? I'm merging four objects, so would be so helpful to know which are causing the issue. - What is the conflict? Is one a superset and I can `join=...`? Are they off by 1 or are they completely different types? - Our `testing.assert_equal` produces pretty nice errors, as a comparison Having these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library. ### Describe the solution you'd like I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date. One thing we do in PRQL is have a file that snapshots error messages [`test_bad_error_messages.rs`](https://github.com/PRQL/prql/blob/587aa6ec0e2da0181103bc5045cc5dfa43708827/crates/prql-compiler/src/tests/test_bad_error_messages.rs), which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter, `pytest-regtest` is the best I've found; I wrote `pytest-accept` but requires doctests). Any other ideas? ### Describe alternatives you've considered _No response_ ### Additional context A couple of specific error-message issues: - https://github.com/pydata/xarray/issues/2078 - https://github.com/pydata/xarray/issues/5290","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8264/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 529644880,MDU6SXNzdWU1Mjk2NDQ4ODA=,3580,xr.DataArray.values fails with latest versions of netcdf4,16332933,closed,0,,,4,2019-11-28T01:26:07Z,2023-10-18T17:01:17Z,2023-10-18T17:01:17Z,NONE,,,,"#### MCVE Code Sample ```python import xarray as xr xr.show_versions() url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/NCEP-CFSv2/.HINDCAST/.MONTHLY/.sst/dods' fullda = xr.open_dataset(url, decode_times=False,chunks={'S': 'auto', 'L': 'auto', 'M':'auto','X':'auto','Y':'auto'}) print(fullda) print(fullda['sst'][:10,0,0,0,0].values) ``` #### Expected Output ```python Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181) Coordinates: * X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 * S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0 * M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0 * Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0 Data variables: sst (S, L, M, Y, X) float32 dask.array Attributes: Conventions: IRIDL [-25.652588 -35.577393 -48.702896 -51.3853 -50.687195 -50.341995 -50.407593 -54.955994 -52.052994 -47.31279 ] ``` #### Problem Description This should return the array’s data as a numpy.ndarray according to the documentation and as shown above. I tested this with various versions of netcdf4 and I get the error below for netcdf4 versions 1.5.1, 1.5.1.2, 1.5.3 (latest version). If I use netcdf4 version 1.5.1, I get the expected output as above. ``` python Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181) Coordinates: * X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 * S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0 * M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0 * Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0 Data variables: sst (S, L, M, Y, X) float32 dask.array Attributes: Conventions: IRIDL Traceback (most recent call last): File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 84, in _getitem array = getitem(original_array, key) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/common.py"", line 54, in robust_getitem return array[key] File ""netCDF4/_netCDF4.pyx"", line 4408, in netCDF4._netCDF4.Variable.__getitem__ File ""netCDF4/_netCDF4.pyx"", line 5350, in netCDF4._netCDF4.Variable._get IndexError: index exceeds dimension bounds During handling of the above exception, another exception occurred: Traceback (most recent call last): File ""testpython.py"", line 7, in print(fullda['sst'][:10,0,0,0,0].values) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/dataarray.py"", line 567, in values return self.variable.values File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py"", line 448, in values return _as_array_or_item(self._data) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py"", line 254, in _as_array_or_item data = np.asarray(data) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py"", line 1314, in __array__ x = self.compute() File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py"", line 165, in compute (result,) = compute(self, traverse=False, **kwargs) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py"", line 436, in compute results = schedule(dsk, keys, **kwargs) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/threaded.py"", line 81, in get **kwargs File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 486, in get_async raise_exception(exc, tb) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 316, in reraise raise exc File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 222, in execute_task result = _execute_task(task, data) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/core.py"", line 119, in _execute_task return func(*args2) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py"", line 106, in getter c = np.asarray(c) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 481, in __array__ return np.asarray(self.array, dtype=dtype) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 643, in __array__ return np.asarray(self.array, dtype=dtype) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 547, in __array__ return np.asarray(array[self.key], dtype=None) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 94, in _getitem raise IndexError(msg) IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load(). ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.4.3.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.8.1 distributed: 2.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 42.0.1.post20191125 pip: 19.3.1 conda: None pytest: None IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3580/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1924497392,I_kwDOAMm_X85ytX_w,8269,open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0',6819509,closed,0,,,4,2023-10-03T16:19:54Z,2023-10-18T16:50:20Z,2023-10-18T16:50:20Z,NONE,,,,"### What is your issue? When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units ""days accumulated"", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?). Open the zarr: ```python import xarray as xr ds = xr.open_dataset('debug.zarr', engine='zarr', chunks={}) ``` Print as a pandas-like table for each version of xarray for readability: ```python ds.to_dataframe() ``` Version '2023.8.0': |time|dapr (dtype=float32)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|NaN|NaN| |2000-01-02|NaN|NaN| |2000-01-03|2.0|1.5| Version '2023.9.0': |time|dapr (dtype=float64)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|-9.223372e+18|NaN| |2000-01-02|-9.223372e+18|NaN| |2000-01-03|2.000000e+00|1.5| I can manually disable this by using the ""use_cf=False"", ""mask_and_scale=False"", and then manually scale this variable, though that is not ideal. The ""decode_timedelta"" doesn't seem to have an effect on this data, either. I understand the ""days"" keyword is in my units, however the full unit is ""days accumulated"". Has the behavior of xarray changed to find keywords such as ""days"" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help. ### Code to create the debug.zarr for the tables above: ```python import numpy as np import pandas as pd import xarray as xr import zarr # Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily) # mdpr is the amount of a multiday total (inches) # dapr is the number of days each multiday total occurred over (days accumulated). # In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03 # I use float32 to represent these, but pack these as int16 values in the zarr. mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32) dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32) time = pd.date_range('2000-01-01', periods=3) # Create a dataset from these values ds = xr.Dataset( data_vars=dict( mdpr=(['time'], mdpr), dapr=(['time'], dapr), ), coords=dict( time=time, ), attrs=dict(description='multiday precipitation data'), ) # Specify encoding to pack these float32 values as int16 encoding = { 'mdpr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 0.01, 'add_offset': 0.0, 'dtype': np.int16, }, 'dapr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 1.0, 'add_offset': 0.0, 'dtype': np.int16, }, } # Create attributes. The ""units"" for the dapr variable seems to be the issue ""days"" in the # ""days accumulated"" ds.mdpr.attrs['units'] = 'inches' ds.mdpr.attrs['description'] = 'multiday precip amount' ds.dapr.attrs['units'] = 'days accumulated' ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation' # Save to zarr ds.to_zarr('debug.zarr', mode='w', encoding=encoding) ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8269/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1384226112,I_kwDOAMm_X85SgZ1A,7075,Convert xarray dataset to pandas dataframe is much slower in newest xarray version,20794996,closed,0,,,4,2022-09-23T19:36:28Z,2023-10-14T20:37:40Z,2023-10-14T20:37:40Z,NONE,,,,"### What is your issue? Converting an xarray dataset to pandas dataframe has become much slower in the newest xarray version. I want to read in very large netcdf files, extract a slice, and convert the slice to a pandas dataframe. For an input size of 2GB, the xarray version 0.21.0 takes 3 seconds versus the xarray version 2022.6.0 takes 44 seconds. See table below for more tests with increasing size of xarray dataset. Number of NetCDF Input Files in Xarray Dataset (~1GB per file): | 2 | 5 | 10 | 15 | 20 | 30 | 40 -- | -- | -- | -- | -- | -- | -- | -- Older Xarray Version 0.21.0 | 0:03 | 0:02 | 0:04 | 0:06 | 0:09 | 0:13 | 0:17 Newer Xarray Version 2022.6.0 | 0:44 | 1:30 | 2:46 | 4:01 | 5:23 | 7:56 | 10:29 Here is my code: ``` # Read in a list of netcdf files and combine into a single dataset. with xr.open_mfdataset(infile_list, combine='by_coords') as ds: # Extract the data for a single location (the nearest grid point) using the provided coordinates (lat/lon). ds_slice = ds.sel(lon=-84.725, lat=42.3583, method='nearest') # Convert xarray dataset to a pandas dataframe. # This is now the slow part since the xarray library was updated. df = ds_slice.to_dataframe() ``` The netcdf files I am reading in are about 1 GB each, containing daily weather data for the entire CONUS. There is 1 file per year, so if I read in 2 files, the dimensions are (lon: 1386, lat: 585, day: 731, crs: 1) with coordinates of lon, lat, day, and crs. They include 8 float data variables. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7075/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1943355490,I_kwDOAMm_X85z1UBi,8308,Different plotting reaults compared to matplotlib,30388627,closed,0,,,4,2023-10-14T15:54:32Z,2023-10-14T20:02:16Z,2023-10-14T20:02:16Z,NONE,,,,"### What happened? I got different results when I tried to plot 2D data [test.npy.zip](https://github.com/pydata/xarray/files/12906635/test.npy.zip) using matplotlib and xarray. ### matplotlib ![image](https://github.com/pydata/xarray/assets/30388627/aca2bd14-33e5-4a2e-9b01-65432c63af47) ### xarray ![image](https://github.com/pydata/xarray/assets/30388627/3a9b0793-34f7-49d1-ac5f-bd07d83a958d) ### What did you expect to happen? Same plot. ### Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr import matplotlib.pyplot as plt test = np.load('test.npy') plt.imshow(test, vmin=0, vmax=200) plt.colorbar() xr.DataArray(test).plot.imshow(vmin=0, vmax=200) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 22.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.9.0 pandas: 2.1.1 numpy: 1.26.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8308/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1821467933,I_kwDOAMm_X85skWUd,8021,Specify chunks in bytes,306380,open,0,,,4,2023-07-26T02:29:43Z,2023-10-06T10:09:33Z,,MEMBER,,,,"### Is your feature request related to a problem? I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an `open_zarr` call and then provide the right `chunks=` argument. I'll admit though that I wouldn't mind giving Xarray a value like `""1 GiB""` though and having it use that when determining `""auto""` chunk sizes. Dask array does this in two ways. We can provide a value in chunks as like the following: ```python x = da.random.random(..., chunks=""1 GiB"") ``` We also refer to a value in Dask config ```python In [1]: import dask In [2]: dask.config.get(""array.chunk-size"") Out[2]: '128MiB' ``` This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂 ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8021/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1169750048,I_kwDOAMm_X85FuPgg,6360,Multidimensional `interpolate_na()`,5797727,open,0,,,4,2022-03-15T14:27:46Z,2023-09-28T11:51:20Z,,NONE,,,,"### Is your feature request related to a problem? I think that having a way to run a multidimensional interpolation for filling missing values would be awesome. The code snippet below create a data and show the problem I am having now. If the data has some orientation, we couldn't simply interpolate dimensions separately. ```python import xarray as xr import numpy as np n = 30 x = xr.DataArray(np.linspace(0,2*np.pi,n),dims=['x']) y = xr.DataArray(np.linspace(0,2*np.pi,n),dims=['y']) z = (np.sin(x)*xr.ones_like(y)) mask = xr.DataArray(np.random.randint(0,1+1,(n,n)).astype('bool'),dims=['x','y']) kw = dict(add_colorbar=False) fig,ax = plt.subplots(1,3,figsize=(11,3)) z.plot(ax=ax[0],**kw) z.where(mask).plot(ax=ax[1],**kw) z.where(mask).interpolate_na('x').plot(ax=ax[2],**kw) ``` ![image](https://user-images.githubusercontent.com/5797727/158399309-437aaf06-9cbf-41dc-8668-5ac6b13016b6.png) I tried to use advanced interpolation for that, but it doesn't look like the best solution. ```python zs = z.where(mask).stack(k=['x','y']) zs = zs.where(np.isnan(zs),drop=True) xi,yi = zs.k.x.drop('k'),zs.k.y.drop('k') zi = z.interp(x=xi,y=yi) fig,ax = plt.subplots() z.where(mask).plot(ax=ax,**kw) ax.scatter(xi,yi,c=zi,**kw,linewidth=1,edgecolor='k') ``` returns ![image](https://user-images.githubusercontent.com/5797727/158399255-4fcce6fa-c12d-4745-a99a-0a2b27506573.png) ### Describe the solution you'd like Simply `z.interpolate_na(['x','y'])` ### Describe alternatives you've considered I could extract the data to `numpy` and interpolate using `scipy.interpolate.griddata`, but this is not the way `xarray` should work. ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6360/reactions"", ""total_count"": 11, ""+1"": 9, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,,13221727,issue 1905824568,I_kwDOAMm_X85xmJM4,8221,Frequent doc build timeout / OOM,5635139,open,0,,,4,2023-09-20T23:02:37Z,2023-09-21T03:50:07Z,,MEMBER,,,,"### What is your issue? I'm frequently seeing `Command killed due to timeout or excessive memory consumption` in the doc build. It's after 1552 seconds, so it not being a round number means it might be the memory? It follows `writing output... [ 90%] generated/xarray.core.rolling.DatasetRolling.max`, which I wouldn't have thought as a particularly memory-intensive part of the build? Here's an example: https://readthedocs.org/projects/xray/builds/21983708/ Any thoughts for what might be going on? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8221/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1326238990,I_kwDOAMm_X85PDM0O,6870,`rolling_exp` loses coords,5635139,closed,0,,,4,2022-08-02T18:27:44Z,2023-09-19T01:13:23Z,2023-09-19T01:13:23Z,MEMBER,,,,"### What happened? We lose the time coord here — `Dimensions without coordinates: time`: ```python ds = xr.tutorial.load_dataset(""air_temperature"") ds.rolling_exp(time=5).mean() Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 Dimensions without coordinates: time Data variables: air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.4 296.1 295.7 ``` (I realize I wrote this, I didn't think this used to happen, but either it always did or I didn't write good enough tests... mea culpa) ### What did you expect to happen? We keep the time coords, like we do for normal `rolling`: ```python In [2]: ds.rolling(time=5).mean() Out[2]: Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 ``` ### Minimal Complete Verifiable Example ```Python (as above) ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (main, May 24 2022, 21:13:51) [Clang 13.1.6 (clang-1316.0.21.2)] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.21.6 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.12.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.12.0 distributed: 2021.12.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: 0.2.1 fsspec: 2021.11.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 62.3.2 pip: 22.1.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.3.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6870/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 598991028,MDU6SXNzdWU1OTg5OTEwMjg=,3967,Support static type analysis ,6130352,closed,0,,,4,2020-04-13T16:34:43Z,2023-09-17T19:43:32Z,2023-09-17T19:43:31Z,NONE,,,,"As a related discussion to https://github.com/pydata/xarray/issues/3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis. In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able enforce that names and dtypes associated with both data variables and coordinates meet certain constraints. @keewis mentioned an example of this in https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 where it might be possible to use something like a ```TypedDict``` to constrain variable/coord names and array dtypes, but this won't work with TypedDict as it's currently implemented. Another possibility could be generics, and I took a stab at that in https://github.com/pydata/xarray/issues/3959#issuecomment-612513722 (though this would certainly be more intrusive). An example of where this would be useful is in adding extensions through accessors: ```python @xr.register_dataset_accessor('ext') def ExtAccessor: def __init__(self, ds) self.data = ds def is_zero(self): return self.ds['data'] == 0 ds = xr.Dataset(dict(DATA=xr.DataArray([0.0]))) # I'd like to catch that ""data"" was misspelled as ""DATA"" and that # this particular method shouldn't be run against floats prior to runtime ds.ext.is_zero() ``` I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too. There is a related conversation on doing something like this for Pandas DataFrames at https://github.com/python/typing/issues/28#issuecomment-351284520, so that might be helpful context for possibilities with ```TypeDict```.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3967/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 561921094,MDU6SXNzdWU1NjE5MjEwOTQ=,3762,xarray groupby/map fails to parallelize,6491058,closed,1,,,4,2020-02-07T23:20:59Z,2023-09-15T15:52:42Z,2023-09-15T15:52:41Z,NONE,,,,"#### MCVE Code Sample ```python import sys import math import logging import dask import xarray import numpy logger = logging.getLogger('main') if __name__ == '__main__': logging.basicConfig( stream=sys.stdout, format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S') logger.info('Starting dask client') client = dask.distributed.Client() SIZE = 100000 SONAR_BINS = 2000 time = range(0, SIZE) upper_limit = numpy.random.randint(0, 10, (SIZE)) lower_limit = numpy.random.randint(20, 30, (SIZE)) sonar_data = numpy.random.randint(0, 255, (SIZE, SONAR_BINS)) channel = xarray.Dataset({ 'upper_limit': (['time'], upper_limit, {'units': 'depth meters'}), 'lower_limit': (['time'], lower_limit, {'units': 'depth meters'}), 'data': (['time', 'depth_bin'], sonar_data, {'units': 'amplitude'}), }, coords={ 'depth_bin': (['depth_bin'], range(0,SONAR_BINS)), 'time': (['time'], time) }) logger.info('get overall min/max radar range we want to normalize to called the adjusted range') adjusted_min, adjusted_max = channel.upper_limit.min().values.item(), channel.lower_limit.max().values.item() adjusted_min = math.floor(adjusted_min) adjusted_max = math.ceil(adjusted_max) logger.info('adjusted_min: %s, adjusted_max: %s', adjusted_min, adjusted_max) bin_count = len(channel.depth_bin) logger.info('bin_count: %s', bin_count) adjusted_depth_per_bin = (adjusted_max - adjusted_min) / bin_count logger.info('adjusted_depth_per_bin: %s', adjusted_depth_per_bin) adjusted_bin_depths = [adjusted_min + (j * adjusted_depth_per_bin) for j in range(0, bin_count)] logger.info('adjusted_bin_depths[0]: %s ... [-1]: %s', adjusted_bin_depths[0], adjusted_bin_depths[-1]) def Interp(ds): # Ideally instead of using interp we will use some kind of downsampling and shift # this doesnt exist in xarray though and interp is good enough for the moment # I just added this to debug t = ds.time.values.item() if (t % 100) == 0: total = len(channel.time) perc = 100.0 * t / total logger.info('%s : %s of %s', perc, t, total) unadjusted_depth_amplitudes = ds.data unadjusted_min = ds.upper_limit.values.item() unadjusted_max = ds.lower_limit.values.item() unadjusted_depth_per_bin = (unadjusted_max - unadjusted_min) / bin_count index_mapping = [((adjusted_min + (bin * adjusted_depth_per_bin)) - unadjusted_min) / unadjusted_depth_per_bin for bin in range(0, bin_count)] adjusted_depth_amplitudes = unadjusted_depth_amplitudes.interp(coords={'depth_bin':index_mapping}, method='linear', assume_sorted=True) adjusted_depth_amplitudes = adjusted_depth_amplitudes.rename({'depth_bin':'depth'}).assign_coords({'depth':adjusted_bin_depths}) #logger.info('%s, \n\tunadjusted_depth_amplitudes.values:%s\n\tunadjusted_min:%s\n\tunadjusted_max:%s\n\tunadjusted_depth_per_bin:%s\n\tindex_mapping:%s\n\tadjusted_depth_amplitudes:%s\n\tadjusted_depth_amplitudes.values:%s\n\n', ds, unadjusted_depth_amplitudes.values, unadjusted_min, unadjusted_max, unadjusted_depth_per_bin, index_mapping, adjusted_depth_amplitudes, adjusted_depth_amplitudes.values) return adjusted_depth_amplitudes # Lets split into chunks so could be performed in parallel # This doesnt work to parallelize and only slows it down a lot #logger.info('chunk') #channel = channel.chunk({'time':100}) logger.info('groupby') g = channel.groupby('time') logger.info('do interp') normalized_depth_data = g.map(Interp) logger.info('done') ``` #### Expected Output I am fairly new to xarray but feel this example could have been executed a bit better than xarray currenty does. Each map call of the above custom function should be possible to be parallelized from what I can tell. I imagined that in the backend, xarray would have chunked it and run in parallel on dask. However I find it is VERY slow even for single threaded case but also that it doesn't seem to parallelize. It takes roughly 5msec per map call in my hardware when I don't include the chunk and 70msec with the chunk call you can find in the code. #### Problem Description The single threaded performance is super slow, but also it fails to parallelize the computations across the cores on my machine. If you are after more background to what I am trying to do, I also asked a SO question about how to re-organize the code to improve performance. I felt the current behavior though is a performance bug (assuming I didn't do something completely wrong in the code). https://stackoverflow.com/questions/60103317/can-the-performance-of-using-xarray-groupby-map-be-improved #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here xarray.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 44.0.0.post20200102 pip: 19.3.1 conda: None pytest: None IPython: 7.11.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3762/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1473152374,I_kwDOAMm_X85XzoV2,7348,Using entry_points to register dataset and dataarray accessors?,1386642,open,0,,,4,2022-12-02T16:48:42Z,2023-09-14T19:53:46Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem? External libraries often use the dataset/dataarray accessor pattern (e.g. [metpy](https://github.com/Unidata/MetPy/blob/f568aca6325cb23cfccc1006c4965ef7f7b5ad29/src/metpy/xarray.py#L105)). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g. ``` import metpy # linter complains here # some data ds: xr.Dataset = ... ds.metpy.... ``` ### Describe the solution you'd like Use importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. [metpy's setup.cfg](https://github.com/Unidata/MetPy/blob/f568aca6325cb23cfccc1006c4965ef7f7b5ad29/src/metpy/xarray.py#L105)). ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7348/reactions"", ""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue 1098241812,I_kwDOAMm_X85BddcU,6149,[Bug]: `numpy` `DeprecationWarning` with `DType` and `xr.testing.assert_all_close()` + Dask,25624127,closed,0,,,4,2022-01-10T18:34:27Z,2023-09-13T20:06:59Z,2023-09-13T20:06:58Z,CONTRIBUTOR,,,,"### What happened? A `numpy` `DeprecationWarning` regarding `DType` is being outputted when using `xr.testing.assert_all_close()` to compare two chunked Datasets. This does warning does not appear with two non-chunked datasets. ### What did you expect to happen? The warning should not appear. ### Minimal Complete Verifiable Example ```python class TestTemporalAvg: class TestTimeseries: @pytest.fixture(autouse=True) def setup(self): self.ds: xr.Dataset = generate_dataset(cf_compliant=True, has_bounds=True) # No warning with this test def test_weighted_annual_avg(self): ds = self.ds.copy() result = ds.temporal.temporal_avg(""timeseries"", ""year"", data_var=""ts"") expected = ds.copy() expected[""ts""] = xr.DataArray( name=""ts"", data=np.ones((2, 4, 4)), coords={ ""lat"": self.ds.lat, ""lon"": self.ds.lon, ""year"": pd.MultiIndex.from_tuples( [(2000,), (2001,)], ), }, dims=[""year"", ""lat"", ""lon""], attrs={ ""operation"": ""temporal_avg"", ""mode"": ""timeseries"", ""freq"": ""year"", ""groupby"": ""year"", ""weighted"": ""True"", ""centered_time"": ""True"", }, ) # For some reason, there is a floating point difference between both # for ts so we have to use floating point comparison xr.testing.assert_allclose(result, expected) assert result.ts.attrs == expected.ts.attrs # Warning with this test @requires_dask def test_weighted_annual_avg_with_chunking(self): ds = self.ds.copy().chunk({""time"": 2}) result = ds.temporal.temporal_avg(""timeseries"", ""year"", data_var=""ts"") expected = ds.copy() expected[""ts""] = xr.DataArray( name=""ts"", data=np.ones((2, 4, 4)), coords={ ""lat"": ds.lat, ""lon"": ds.lon, ""year"": pd.MultiIndex.from_tuples( [(2000,), (2001,)], ), }, dims=[""year"", ""lat"", ""lon""], attrs={ ""operation"": ""temporal_avg"", ""mode"": ""timeseries"", ""freq"": ""year"", ""groupby"": ""year"", ""weighted"": ""True"", ""centered_time"": ""True"", }, ) # For some reason, there is a floating point difference between both # for ts so we have to use floating point comparison xr.testing.assert_allclose(result, expected) assert result.ts.attrs == expected.ts.attrs ``` ### Relevant log output ```python DeprecationWarning: The `dtype` and `signature` arguments to ufuncs only select the general DType and not details such as the byte order or time unit (with rare exceptions see release notes). To avoid this warning please use the scalar types `np.float64`, or string notation. In rare cases where the time unit was preserved, either cast the inputs or provide an output array. In the future NumPy may transition to allow providing `dtype=` to denote the outputs `dtype` as well. (Deprecated NumPy 1.21) return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ``` ### Anything else we need to know? _No response_ ### Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.45.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.11.2 distributed: 2021.11.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: 4.3.1 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6149/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 1075765204,I_kwDOAMm_X85AHt_U,6055,Unexpected type conversion in variables with _FillValue,24235303,closed,0,,,4,2021-12-09T16:26:54Z,2023-09-13T12:40:14Z,2023-09-13T12:40:13Z,CONTRIBUTOR,,,,"**What happened**: When opening a dataset with an int16 variable with the `_FillValue` attribute, the variable is converted from type int16 to float32. This was originally reported to the TileDB-CF-Py Git repo that contains a TileDB backend for xarray. See [TileDB-CF-Py issue #117](https://github.com/TileDB-Inc/TileDB-CF-Py/issues/117). **What you expected to happen**: I would expect the type to remain the same when applying the _FillValue. **Minimal Complete Verifiable Example**: Original example from [TileDB-CF-Py issue #117](https://github.com/TileDB-Inc/TileDB-CF-Py/issues/117) using the TileDB backend. ```python import tiledb import xarray as xr import numpy as np index = tiledb.Dim(name='index', domain=(0, 3)) domain = tiledb.Domain(index) var = tiledb.Attr(name='var', dtype=np.int16) schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False) tiledb.Array.create('dense_array0', schema) with tiledb.open('dense_array0', 'w') as A: A[:] = np.array([5, 6, 7, 8], dtype=np.int16) ds = xr.open_dataset('dense_array0', engine='tiledb') ds['var'].dtype ``` NetCDF example with the same behavior: ```python import netCDF4 import xarray as xr import numpy as np filename = 'temp_file.nc' with netCDF4.Dataset(filename, mode=""w"") as group: group.createDimension(""index"", 4) var = group.createVariable(""var"", np.int16, (""index"",), fill_value=-1) var[:] = np.array([5, 6, 7, 8], dtype=np.int16) dataset = xr.open_dataset(filename) dataset[""var""].dtype ``` **Anything else we need to know?**: * I was able to verify the type conversion from int16 to float32 occurs in the `conventions.decode_cf_variables` call in the `open_dataset` method of `StoreBackendEntrypoint`. * I was able to verify the conversion does not happen if `mask_and_scale=False`. * Note that TileDB is automatically setting a fill value for all dense numerical arrays, and so we are always setting the `_FillValue` attribute for variables from the TileDB backend. **Environment**: I was able to reproduce this with both xarray 0.19.0 and 0.20.1 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6055/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,completed,13221727,issue 514672231,MDU6SXNzdWU1MTQ2NzIyMzE=,3466,RuntimeError: NetCDF: DAP failure,47066389,closed,1,,,4,2019-10-30T13:32:34Z,2023-09-12T16:00:57Z,2023-09-12T16:00:57Z,NONE,,,,"Hi all, I am interested in extracting specific point and variable information from the GEOS-FC product, accessible via OpenDap. Loading the data seems to work fine, and I can do some processing to my specific needs. Ideally I would like to convert this selection to a dataframe, or if needed store as an intermediate file from which I can read again. Yet when doing so, I get the following error: RuntimeError: NetCDF: DAP failure I am not sure what is causing this? Perhaps I chunck the data in the wrong (inefficient) way? Or there is an error with the GEOS netcdf files? Or ... Below a working code snippet. ``` python import xarray as xr idir_geos = 'https://opendap.nccs.nasa.gov/dods/gmao/geos-cf/assim/chm_tavg_1hr_g1440x721_v1' def preprocess(ds): ''' Rename variables and select the relevant ones. Remove lev''' ds = ds.rename({'pm25_rh35_gcc': 'PM2.5','no': 'NO','no2': 'NO2','o3': 'O3','so2': 'SO2','co': 'CO'}) ds = ds[['PM2.5','NO','NO2','O3','SO2','CO']] ds = ds.squeeze('lev') return ds ds = xr.open_mfdataset([idir_geos],preprocess=preprocess,combine='by_coords') lat = 51.25 lon = 4.25 pol = 'O3' ds_sel = ds.sel(lat=lat,lon=lon,method='nearest')[pol] df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1) #ds_sel.to_netcdf('test.nc') # Runtime error ``` Traceback error: > Traceback (most recent call last): File ""/home/demuzmp4/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py"", line 3291, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File """", line 57, in df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4285, in to_dataframe return self._to_dataframe(self.dims) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4273, in _to_dataframe for k in columns File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4273, in for k in columns File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py"", line 437, in values return _as_array_or_item(self._data) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py"", line 250, in _as_array_or_item data = np.asarray(data) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/usr/lib/python3/dist-packages/dask/array/core.py"", line 1138, in __array__ x = self.compute() File ""/usr/lib/python3/dist-packages/dask/base.py"", line 135, in compute (result,) = compute(self, traverse=False, **kwargs) File ""/usr/lib/python3/dist-packages/dask/base.py"", line 333, in compute results = get(dsk, keys, **kwargs) File ""/usr/lib/python3/dist-packages/dask/threaded.py"", line 75, in get pack_exception=pack_exception, **kwargs) File ""/usr/lib/python3/dist-packages/dask/local.py"", line 521, in get_async raise_exception(exc, tb) File ""/usr/lib/python3/dist-packages/dask/compatibility.py"", line 60, in reraise raise exc File ""/usr/lib/python3/dist-packages/dask/local.py"", line 290, in execute_task result = _execute_task(task, data) File ""/usr/lib/python3/dist-packages/dask/local.py"", line 271, in _execute_task return func(*args2) File ""/usr/lib/python3/dist-packages/dask/array/core.py"", line 72, in getter c = np.asarray(c) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 490, in __array__ return np.asarray(self.array, dtype=dtype) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 652, in __array__ return np.asarray(self.array, dtype=dtype) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 556, in __array__ return np.asarray(array[self.key], dtype=None) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py"", line 73, in __array__ return self.func(self.array) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py"", line 142, in _apply_mask data = np.asarray(data, dtype=dtype) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray return array(a, dtype, copy=False, order=order) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 556, in __array__ return np.asarray(array[self.key], dtype=None) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 72, in __getitem__ key, self.shape, indexing.IndexingSupport.OUTER, self._getitem File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 836, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 84, in _getitem array = getitem(original_array, key) File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/common.py"", line 54, in robust_getitem return array[key] File ""netCDF4/_netCDF4.pyx"", line 4408, in netCDF4._netCDF4.Variable.__getitem__ File ""netCDF4/_netCDF4.pyx"", line 5352, in netCDF4._netCDF4.Variable._get File ""netCDF4/_netCDF4.pyx"", line 1887, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: DAP failure More info on my xarray installation: ------------------ commit: None python: 3.6.9 (default, Jul 3 2019, 07:38:46) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_GB.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: 1.2.1 dask: 0.16.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 9.0.1 conda: None pytest: 5.2.1 IPython: 7.3.0 sphinx: 1.8.4","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3466/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1339921253,I_kwDOAMm_X85P3ZNl,6919,Parallel read with MPI,8100801,closed,0,,,4,2022-08-16T07:19:14Z,2023-09-12T15:16:32Z,2023-09-12T15:16:31Z,NONE,,,,"### Is your feature request related to a problem? Is it possible to somehow extend xarray to use MPI I/O? ### Describe the solution you'd like We would need to know the offset from where the actual data starts within the file. Is there a way of retrieving that? Disclaimer: I am not an expert of NetCDF format - so, apologies if the question is trivial! ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6919/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1861335844,I_kwDOAMm_X85u8bsk,8096,Errors when saving PyObject coordinates,38408316,closed,0,,,4,2023-08-22T12:14:53Z,2023-09-06T11:44:41Z,2023-09-06T11:44:41Z,CONTRIBUTOR,,,,"### What happened? Hi, I'm trying to create a `DataArray` with coordinates that are tuples and potentionally even more dimensional objects. The way I did it is to create an empty `numpy` array with `dtype=object` and then insert my tuples inside. This doesn't throw an error when creating a `DataArray` (as opposed to using a 2D ndarray or a list of lists). However, when trying to save it to `zarr` or `netcdf`. I get an error saying `ValueError: setting an array element with a sequence` ### What did you expect to happen? I want to be able to save and load such coordinates without errors. Maybe there is a cleaner way to do it than the object dtype ndarray? ### Minimal Complete Verifiable Example ```Python n = 5 x = np.empty(n, dtype=object) for i in range(n): x[i] = (i, i) xr.DataArray(np.arange(n), dims=(""x""), coords={""x"": x}).to_zarr(""test"") ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python File c:\Users\Wiktor\AppData\Local\pypoetry\Cache\virtualenvs\spin1-JGuolXDk-py3.11\Lib\site-packages\xarray\core\dataarray.py:4014, in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 4010 else: 4011 # No problems with the name - so we're fine! 4012 dataset = self.to_dataset() -> 4014 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 4015 dataset, 4016 path, 4017 mode=mode, 4018 format=format, 4019 group=group, 4020 engine=engine, 4021 encoding=encoding, 4022 unlimited_dims=unlimited_dims, ... 101 result = np.empty(data.shape, dtype) --> 102 result[...] = data 103 return result ValueError: setting an array element with a sequence. ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('Polish_Poland', '1250') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.0.3 numpy: 1.25.2 scipy: 1.11.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.0 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.7.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: 7.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8096/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1870484988,I_kwDOAMm_X85vfVX8,8120,"`open_mfdataset` exits while sending a ""Segmentation fault"" error",50383939,closed,0,,,4,2023-08-28T20:51:23Z,2023-09-01T15:43:08Z,2023-09-01T15:43:08Z,NONE,,,,"### What is your issue? I try to open about ~10 files, each 5MB as a test case, using `xarray`'s `open_mfdataset` method with the `parallel=True` option, however, it throws a ""Segmentation fault"" error as the following: ```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import xarray as xr In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}) In [3]: ds Out[3]: Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array lat (rlat, rlon) float32 dask.array * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention _