github: issues: 416 rows where comments = 4 and type = "issue" sorted by updated

416 rows where comments = 4 and type = "issue" sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
2276352251	I_kwDOAMm_X86HrmD7	8994	Improving performance of open_datatree	TomNicholas 35968931	open	0	4	2024-05-02T19:43:17Z	2024-05-03T15:25:33Z		MEMBER	What is your issue? The implementation of `open_datatree` works, but is inefficient, because it calls `open_dataset` once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330. We discussed this in the datatree meeting, and my understanding is that concretely we need to: [ ] Create an asv benchmark for `open_datatree`, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups. [ ] Refactor the `NetCDFDatastore` class to only create one `CachingFileManager` object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406. [ ] Refactor `NetCDF4BackendEntrypoint.open_datatree` to use an implementation that goes through `NetCDFDatastore` without calling the top-level `xr.open_dataset` again. [ ] Check the performance of calling `xr.open_datatree` on a netCDF file has actually improved. It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2163608564	I_kwDOAMm_X86A9gv0	8802	Error when using `apply_ufunc` with `datetime64` as output dtype	gcaria 44147817	open	0	4	2024-03-01T15:09:57Z	2024-05-03T12:19:14Z		CONTRIBUTOR	What happened? When using `apply_ufunc` with `datetime64[ns]` as output dtype, code throws error about converting from specific units to generic datetime units. What did you expect to happen? No response Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray: return time[:10] def fn(da: xr.DataArray) -> xr.DataArray: dim_out = "time_cp" `return xr.apply_ufunc( _fn, da, da.time, input_core_dims=[["time"], ["time"]], output_core_dims=[[dim_out]], vectorize=True, dask="parallelized", output_dtypes=["datetime64[ns]"], dask_gufunc_kwargs={"allow_rechunk": True, "output_sizes": {dim_out: 10},}, exclude_dims=set(("time",)), )` da_fake = xr.DataArray(np.random.rand(5,5,5), coords=dict(x=range(5), y=range(5), time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]') )).chunk(dict(x=2,y=2)) fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas fn(da_fake).compute() # same errors as above ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python ValueError Traceback (most recent call last) Cell In[211], line 1 ----> 1 fn(da_fake).compute() File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, kwargs) 1144 """Manually trigger loading of this array's data from disk or a 1145 remote source into memory and return a new array. The original is 1146 left unaltered. (...) 1160 dask.compute 1161 """ 1162 new = self.copy(deep=False) -> 1163 return new.load(kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, kwargs) 1119 def load(self, kwargs) -> Self: 1120 """Manually trigger loading of this array's data from disk or a 1121 remote source into memory and return this array. 1122 (...) 1135 dask.compute 1136 """ -> 1137 ds = self._to_temp_dataset().load(kwargs) 1138 new = self._from_temp_dataset(ds) 1139 self._variable = new._variable File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, kwargs) 850 chunkmanager = get_chunked_array_type(lazy_data.values()) 852 # evaluate all the chunked arrays simultaneously --> 853 evaluated_data = chunkmanager.compute(lazy_data.values(),** kwargs) 855 for k, data in zip(lazy_data, evaluated_data): 856 self.variables[k].data = data File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, data, kwargs) 67 def compute(self, data: DaskArray, *kwargs) -> tuple[np.ndarray, ...]: 68 from dask.array import compute ---> 70 return compute(data,** kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 625 postcomputes.append(x.dask_postcompute()) 627 with shorten_traceback(): --> 628 results = schedule(dsk, keys, kwargs) 630 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.call(self, args, kwargs) 2369 self._init_stage_2(args, *kwargs) 2370 return self -> 2372 return self._call_as_normal(args,** kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, args, kwargs) 2362 vargs = [args[_i] for _i in inds] 2363 vargs.extend([kwargs[_n] for _n in names]) -> 2365 return self._vectorize_call(func=func, args=vargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args) 2444 """Vectorized call to `func` over positional `args`.""" 2445 if self.signature is not None: -> 2446 res = self._vectorize_call_with_signature(func, args) 2447 elif not args: 2448 res = func() File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args) 2502 outputs = _create_arrays(broadcast_shape, dim_sizes, 2503 output_core_dims, otypes, results) 2505 for output, result in zip(outputs, results): -> 2506 output[index] = result 2508 if outputs is None: 2509 # did not call the function even once 2510 if otypes is None: ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas ``` Anything else we need to know? No response* Environment	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8802/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2270275688	I_kwDOAMm_X86HUaho	8985	update `to_netcdf` docstring to list support for explicit CDF5 writes	JulioTBacmeister 9221710	open	0	4	2024-04-30T00:41:13Z	2024-04-30T20:48:46Z		NONE	Is your feature request related to a problem? I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command. Describe the solution you'd like When I write a netcdf file using: D.to_netcdf( filename ) then ask ncdump to tell me the kind of file I have, ncdump -k filename it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command: nccopy -k cdf5 filename cdf5_filename the file now works in CAM. Also, the command ncdump -k cdf5_filename returns 'cdf5'. I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command. Describe alternatives you've considered Writing netcdf-4 files from xarray and converting via nccopy -k cdf5 filename cdf5_filename Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8985/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1389295853	I_kwDOAMm_X85Szvjt	7099	Pass arbitrary options to sel()	benbovy 4160723	open	0	4	2022-09-28T12:44:52Z	2024-04-30T00:44:18Z		MEMBER	Is your feature request related to a problem? Currently `.sel()` accepts two options `method` and `tolerance`. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes. It would be also useful for custom indexes to expose their own selection options, e.g., index query optimization like the `dualtree` flag of sklearn.neighbors.KDTree.query k-nearest neighbors selection with the creation of a new "k" dimension (+ coordinate / index) with user-defined name and size. From #3223, it would be nice if we could also pass distinct options values per index. What would be a good API for that? Describe the solution you'd like Some ideas: A. Allow passing a tuple `(labels, options_dict)` as indexer value `python ds.sel(x=([0, 2], {"method": "nearest"}), y=3)` B. Expose an `options` kwarg that would accept a nested dict `python ds.sel(x=[0, 2], y=3, options={"x": {"method": "nearest"}})` Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great. Any other ideas? Some sort of context manager? Some `Index` specific API? Describe alternatives you've considered The API proposed in #3223 would look great if `method` and `tolerance` were the only accepted options, but less so for arbitrary options. Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7099/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
481761508	MDU6SXNzdWU0ODE3NjE1MDg=	3223	Feature request for multiple tolerance values when using nearest method and sel()	NicWayand 1117224	open	0	4	2019-08-16T19:53:31Z	2024-04-29T23:21:04Z		NONE	```python import xarray as xr import numpy as np import pandas as pd Create test data ds = xr.Dataset() ds.coords['lon'] = np.arange(-120,-60) ds.coords['lat'] = np.arange(30,50) ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30') ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time']) target_lat = [36.83] target_lon = [-110] target_time = [np.datetime64('2019-06-01')] Nearest pulls a date too far away ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest') Adding tolerance for lat long, but also applied to time ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5) Ideally tolerance could accept a dictionary but currently fails ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')}) ``` Expected Output A dataset with nearest values to tolerances on each dim. Problem Description I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \| packaged by conda-forge \| (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3223/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2259316341	I_kwDOAMm_X86Gqm51	8965	Support concurrent loading of variables	dcherian 2448579	open	0	4	2024-04-23T16:41:24Z	2024-04-29T22:21:51Z		MEMBER	Is your feature request related to a problem? Today if users have to concurrently load multiple variables in a DataArray or Dataset, they have to use dask. It struck me that it'd be pretty easy for `.load` to gain an `executor` kwarg that accepts anything that follows the `concurrent.futures` executor interface, and parallelize this loop. https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8965/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1250939008	I_kwDOAMm_X85Kj9CA	6646	`dim` vs `dims`	max-sixty 5635139	closed	0	4	2022-05-27T16:15:02Z	2024-04-29T18:24:56Z	2024-04-29T18:24:56Z	MEMBER	What is your issue? I've recently been hit with this when experimenting with `xr.dot` and `xr.corr` — `xr.dot` takes `dims`, and `xr.cov` takes `dim`. Because they each take multiple arrays as positional args, kwargs are more conventional. Should we standardize on one of these?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6646/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1024011835	I_kwDOAMm_X849CS47	5857	Incorrect results when using xarray.ufuncs.angle(..., deg=True)	cvr 1119116	closed	0	4	2021-10-12T16:24:11Z	2024-04-28T20:58:55Z	2024-04-28T20:58:54Z	NONE	What happened: The `xarray.ufuncs.angle` is broken. From the help docstring one may use option `deg=True` to have the result in degrees instead of radians (which is consistent with `numpy.angle` function). Yet results show that this is not the case. Moreover specifying `deg=True` or `deg=False` leads to the same result with the values in radians. What you expected to happen: To have the result of `xarray.ufuncs.angle` converted to degrees when option `deg=True` is specified. Minimal Complete Verifiable Example: ```python Put your MCVE code here import numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd)) D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D.values%360} instead of {ds.wd.values}" \ + f"\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!") D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!") D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!") ``` Anything else we need to know?: Though `xarray.ufuncs` has a deprecated warning stating that the numpy equivalent may be used, this is not true for `numpy.angle`. Example: ```python import numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = np.exp(1j * np.radians(ds.wd)) print(Z) print(f"Is Z an XArray? {isinstance(Z, xr.DataArray)}") D = np.angle(ds.wd, deg=True) print(D) print(f"Is D an XArray? {isinstance(D, xr.DataArray)}") `` If this code is run, the result ofnumpy.angle(xarray.DataArray)`is not a DataArray object, contrary to other numpy operations (for all versions of xarray I've used). Hence the`xarray.ufuncs.angle` is a great option, if it was not for the current problem. Environment: No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost). Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-18-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.utf8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: 4.10.3 pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5857/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2224036575	I_kwDOAMm_X86EkBrf	8905	Variable doesn't have an .expand_dims method	TomNicholas 35968931	closed	0	4	2024-04-03T22:19:10Z	2024-04-28T19:54:08Z	2024-04-28T19:54:08Z	MEMBER	Is your feature request related to a problem? `DataArray` and `Dataset` have an `.expand_dims` method, but it looks like `Variable` doesn't. Describe the solution you'd like Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes. Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8905/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
590630281	MDU6SXNzdWU1OTA2MzAyODE=	3921	issues discovered by the all-but-dask CI	keewis 14808389	closed	0	4	2020-03-30T22:08:46Z	2024-04-25T14:48:15Z	2024-02-10T02:57:34Z	MEMBER	After adding the `py38-all-but-dask` CI in #3919, it discovered a few backend issues: - `zarr`: - [x] `open_zarr` with `chunks="auto"` always tries to chunk, even if `dask` is not available (fixed in #3919) - [x] `ZarrArrayWrapper.__getitem__` incorrectly passes the indexer's `tuple` attribute to `_arrayize_vectorized_indexer` (this only happens if `dask` is not available) (fixed in #3919) - [x] slice indexers with negative steps get transformed incorrectly if `dask` is not available https://github.com/pydata/xarray/pull/8674 - `rasterio`: - ~calling `pickle.dumps` on a `Dataset` object returned by `open_rasterio` fails because a non-serializable lock was used (if `dask` is installed, a serializable lock is used instead)~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3921/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2243685081	I_kwDOAMm_X86Fu-rZ	8945	netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory	brendan-m-murphy 11130776	closed	0	4	2024-04-15T13:26:08Z	2024-04-23T21:49:28Z	2024-04-23T15:33:36Z	NONE	What is your issue? Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory). Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300 fp = xr.Dataset({"fp": (["time", "lat", "lon"], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={"time": pd.date_range(start="2019-01-01T02:00:00", periods=times, freq="1H"), "lat": np.arange(nlat), "lon": np.arange(nlon)}) flux = xr.Dataset({"flux": (["time", "lat", "lon"], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={"time": [pd.to_datetime("2019-01-01")], "lat": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), "lon": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)}) fp.to_netcdf("combine_datasets_tests/fp.nc") flux.to_netcdf("combine_datasets_tests/flux.nc") fp1 = xr.open_dataset("combine_datasets_tests/fp.nc") flux1 = xr.open_dataset("combine_datasets_tests/flux.nc") ``` Then `flux1 = flux1.reindex_like(fp1, method="ffill", tolerance=None)` takes over a minute, while `flux1 = flux1.load().reindex_like(fp1, method="ffill", tolerance=None)` is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this). Profiling the "reindex without load" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds Ordered by: internal time ncalls tottime 1 1 6 72656 72656 72661 145318 2 6 145318 14 145333/145325 1 21 145330 1 1 18 1 ``` percall cumtime percall filename:lineno(function) 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 0.109 0.000 0.109 0.000 utils.py:429(<lambda>) 0.085 0.000 0.136 0.000 utils.py:430(<lambda>) 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 0.048 0.000 0.115 0.000 shape_base.py:370(<genexpr>) 0.045 0.023 0.046 0.023 indexing.py:1334(getitem) 0.044 0.007 0.044 0.007 numeric.py:136(ones) 0.044 0.000 0.067 0.000 index_tricks.py:690(next) 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 0.023 0.000 0.023 0.000 {built-in method builtins.next} 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 0.000 0.000 0.000 0.000 file_manager.py:226(close) The `getitem` call at the top is from `xarray.backends.netCDF4_.py`, line 114. Because of the jittered coordinates in `flux`, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680. In my venv, netCDF4 was installed from a wheel with the following versions: `netcdf4-python version: 1.6.5 HDF5 lib version: 1.12.2 netcdf lib version: 4.9.3-development` This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3. I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8945/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1664193419	I_kwDOAMm_X85jMZOL	7748	diff('non existing dimension') does not raise exception	LunarLanding 4441338	open	0	4	2023-04-12T09:29:58Z	2024-04-21T22:31:37Z		NONE	What happened? Calling xr.DataArray.diff with a non-existing dimension does not raise an exception. What did you expect to happen? An exception to be raised. Minimal Complete Verifiable Example `Python import xarray as xr; import numpy as np; xr.DataArray(np.arange(10),dims=('a',)).diff('b')` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 \| packaged by conda-forge \| (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.0-21-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.2 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: 0.14.0 flox: 0.6.9 numpy_groupies: 0.9.20 setuptools: 67.6.0 pip: 23.0.1 conda: 23.1.0 pytest: 7.2.2 mypy: 1.1.1 IPython: 8.11.0 sphinx: 6.1.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7748/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2237228079	I_kwDOAMm_X86FWWQv	8927	Use a neutral format to have lossless interface with JSON, scipp, Astropy, pandas	loco-philippe 92333742	open	0	4	2024-04-11T08:50:34Z	2024-04-12T14:25:35Z		NONE	Is your feature request related to a problem? Each tool has a specific structure for processing multidimensional data with the following consequences: interfaces dedicated to each tool, partially processed data, no unified representation of data structures Describe the solution you'd like The proposed format (see jupyter notebook, github repository, PyPI package ) is based on the following principles: neutral format available for tabular or multidimensional tools (e.g. Numpy, pandas, xarray, scipp, astropy), taking into account a wide variety of data types as defined in NTV format, high interoperability: reversible (lossless round-trip) interface with tabular or multidimensional tools, reversible and compact JSON format, Ease of sharing and exchanging multidimensional and tabular data, Describe alternatives you've considered No response Additional context https://github.com/numpy/numpy/issues/12481#issuecomment-2049179803 https://github.com/astropy/astropy/issues/16286 https://github.com/scipp/scipp/issues/3422	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8927/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1959816045	I_kwDOAMm_X8500Gtt	8368	to_netcdf: Unexpected drop of "units" attribute of attached "bounds"	leonfoks 15173535	open	0	4	2023-10-24T18:15:05Z	2024-04-09T11:11:20Z		NONE	What happened? When writing a Dataset to netcdf, any DataArrays that are linked as bounds through another variables attrs['bounds'] entry, have their (specifically) 'units' attribute dropped inside the written netcdf file. See example What did you expect to happen? Units attribute to be written to the netcdf file. Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr Create a new Dataset ds = xr.Dataset() Add the x variable, Specify 'x_bnds' as bounds, defined later. ds['x'] = xr.DataArray(np.arange(10), dims='x', attrs={'units':'m', 'bounds':'x_bnds'}) Bounds require an extra dimension equal to number of vertices. ds['nv'] = xr.DataArray(np.r_[0, 1], dims='nv') Add the actual bounding values for variable x. ds['x_bnds'] = xr.DataArray(np.squeeze(np.dstack([np.arange(10)-0.5, np.arange(10)+0.5])), dims=['x', 'nv'], attrs={'test':4, 'units':'m', }) print('Units is attached to the bounds in the dataset before writing', 'units' in ds['x_bnds'].attrs) Write to netcdf file ds.to_netcdf('tmp.nc', format='netcdf4', engine='netcdf4') Open the dataset and check x_bnds attrs. units is dropped. new = xr.open_dataset('tmp.nc') print(new['x_bnds'].attrs) Confirm that units were never written to the file. !h5dump -d /x_bnds tmp.nc ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.3 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: 7.2.6	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8368/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2230680765	I_kwDOAMm_X86E9Xy9	8919	Using the xarray.Dataset.where() function takes up a lot of memory	isLiYang 69391863	closed	0	4	2024-04-08T09:15:49Z	2024-04-09T02:45:09Z	2024-04-09T02:45:08Z	NONE	What is your issue? My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function. The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable ds takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal. ``` Open this netcdf file. ds = xr.open_dataset(track) If longitude range is [-180, 180], then convert to [0, 360]. if np.any(ds[var_lon] < 0): ds[var_lon] = ds[var_lon] % 360 Extract data by longitude and latitude. ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) & (ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3])) Select data by range and value of some variables. for key, value in range_select.items(): ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1])) for key, value in value_select.items(): ds = ds.where(ds[key].isin(value)) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8919/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2228373305	I_kwDOAMm_X86E0kc5	8915	Weird behavior of DataSet.where(... , drop=True)	johannespletzer 22961670	closed	0	4	2024-04-05T16:03:05Z	2024-04-08T09:32:48Z	2024-04-08T09:32:48Z	NONE	What happened? I work with an aircraft emission dataset that is freely available online: emission dataset During my calculations I eventually convert the `DataSet` to a `DataFrame`. My motivation is to avoid unnecessary rows in the DataFrame. Doing some calculations my code returned unexpected results. Eventually I could narrow it down to a `DataSet.where(... , drop=True)` argument I added along the way, which introduces differences in the data. Here are two examples: Example 1: Along some dimensions data points vanished if `drop=True` Example 2: For other dimensions (these?) data points appeared elsewhere if `drop=True` What did you expect to happen? I expect for my calculations to return the same results, regardless of whether drop=True is active or not. Minimal Complete Verifiable Example ```Python !wget "https://zenodo.org/records/10818082/files/Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc" import matplotlib.pyplot as plt import xarray as xr nc_file = xr.open_dataset('Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc') fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lon','time')).plot.contour(x='lat',ax=axs[0]) axs[0].set_xlim(-50,90) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lon','time')).plot.contour(x='lat',ax=axs[1]) axs[1].set_xlim(-50,90) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lat','time')).plot.contour(x='lon',ax=axs[0]) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lat','time')).plot.contour(x='lon',ax=axs[1]) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() ``` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 \| packaged by Anaconda, Inc. \| (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'ISO8859-1') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2022.11.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: None IPython: 8.10.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8915/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2206243581	I_kwDOAMm_X86DgJr9	8876	Possible race condition when appending to an existing zarr	rsemlal-murmuration 157591329	closed	0	4	2024-03-25T16:59:52Z	2024-04-03T15:23:14Z	2024-03-29T14:35:52Z	NONE	What happened? When appending to an existing zarr along a dimension (`to_zarr(..., mode='a', append_dim="x" ,..)`), if the dask chunking of the dataset to append does not align with the chunking of the existing zarr, the resulting consolidated zarr store may have `NaN`s instead of the actual values it is supposed to have. What did you expect to happen? We would expected that zarr append to have the same behaviour as if we concatenate dataset in memory (using `concat`) and write the whole result on a new zarr store in one go Minimal Complete Verifiable Example ```Python from distributed import Client, LocalCluster import xarray as xr import tempfile ds1 = xr.Dataset({"a": ("x", [1., 1.])}, coords={'x': [1, 2]}).chunk({"x": 3}) ds2 = xr.Dataset({"a": ("x", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({"x": 3}) with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode="w") # write first dataset ds2.to_zarr(store, mode="a", append_dim="x") # append first dataset `rez = xr.open_zarr(store).compute() # open consolidated dataset nb_values = rez.a.count().item(0) # count non NaN values if nb_values != 6: print("found NaNs:") print(rez.to_dataframe()) break` ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output `Python /tmp/tmptg_pe6ox /tmp/tmpm7ncmuxd /tmp/tmpiqcgoiw2 /tmp/tmppma1ieo7 /tmp/tmpw5vi4cf0 /tmp/tmp1rmgwju0 /tmp/tmpm6tfswzi found NaNs: a x 1 1.0 2 1.0 3 1.0 4 1.0 5 1.0 6 NaN` Anything else we need to know? The example code snippet provided here, reproduces the issue. Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs. In the example, when `ds1` is first written, since it only contains 2 values along the `x` dimension, the resulting .zarr store have the chunking: `{'x': 2}`, even though we called `.chunk({"x": 3})`. Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is silently changed made this issue harder to spot. However, when we try to append the second dataset `ds2`, that contains 4 values, the `.chunk({"x": 3})` in the begining splits the dask array into 2 dask chunks, but in a way that does not align with zarr chunks. Zarr chunks: + chunk1 : `x: [1; 2]` + chunk2 : `x: [3; 4]` + chunk3 : `x: [5; 6]` Dask chunks for `ds2`: + chunk A: `x: [3; 4; 5]` + chunk B: `x: [6]` Both dask chunks A and B, are supposed to write on zarr chunk3 And depending on who writes first, we can end up with NaN on `x = 5` or `x = 6` instead of actual values. The issue obviously happens only when dask tasks are run in parallel. Using `safe_chunks = True` when calling `to_zarr` does not seem to help. We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?) Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: None seaborn: 0.13.2 numbagg: 0.8.1 fsspec: 2024.3.1 cupy: None pint: None sparse: None flox: 0.9.5 numpy_groupies: 0.10.2 setuptools: 69.2.0 pip: 24.0 conda: None pytest: 8.1.1 mypy: None IPython: 8.22.2 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8876/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2211106929	I_kwDOAMm_X86DytBx	8882	to_zarr silently loses data when using append_dim, if chunks are different to zarr store	harryC-space-intelligence 140395181	closed	0	4	2024-03-27T15:27:02Z	2024-03-29T14:35:51Z	2024-03-29T14:35:51Z	NONE	What happened? When writing a chunked DataArray to an existing zarr store, appending along an existing dimension of the store, I have found that some data are not written if there are multiple array chunks to one zarr chunk. I appreciate it is probably bad practice to have different chunksizes in my DataArray and zarr_store, but I think its a realistic scenario that needs to be caught. This may be related to / the same underlying issue as #8371. Perhaps the checks mentioned in https://github.com/pydata/xarray/issues/8371#issuecomment-1814589157 are somehow getting bypassed? Using zarr's ThreadSynchronizer is the only way I have found to ensure that all the data gets written. What did you expect to happen? I expected that either to_zarr would recognise the different chunk sizes, and re-chunk or wait for all the chunks to be written or an error would be raised, given that the results result in loss of data in an unpredictable way Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np from matplotlib import pyplot as plt x_coords = np.arange(10) y_coords = np.arange(10) t_coords = np.array([np.datetime64('2020-01-01').astype('datetime64[ns]')]) data = np.ones((10,10)) for i in range(4): plt.subplot(1,4,i+1) da = xr.DataArray(data.reshape((-1,10,10)), dims = ['time','x','y'], coords = {'x':x_coords, 'y':y_coords, 'time':t_coords}, ).chunk({'x':5, 'y':5,'time':1}).rename('foo') da.to_zarr('foo.zarr', mode='w') new_time = np.array([np.datetime64('2021-01-01').astype('datetime64[ns]')]) da2 = xr.DataArray(data.reshape((-1,10,10)), dims = ['time','x','y'], coords = {'x':x_coords, 'y':y_coords, 'time':new_time}, ).chunk({'x':1, 'y':1,'time':1}).rename('foo') da2.to_zarr('foo.zarr',append_dim='time', mode='a') plt.imshow(xr.open_zarr('foo.zarr').isel(time=-1).foo.values) ``` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? Output from the plots above: Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 \| packaged by conda-forge \| (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-1041-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: 0.22.0 seaborn: 0.13.2 numbagg: None fsspec: 2024.3.1 cupy: None pint: 0.23 sparse: 0.15.1 flox: 0.9.5 numpy_groupies: 0.10.2 setuptools: 69.2.0 pip: 24.0 conda: 24.1.2 pytest: 8.1.1 mypy: None IPython: 8.22.2 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8882/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
935607748	MDU6SXNzdWU5MzU2MDc3NDg=	5563	Decoding non-utf-8 encoded strings with the h5netcdf engine	kiksekage 11391714	closed	0	4	2021-07-02T09:49:58Z	2024-03-26T15:08:41Z	2024-03-26T15:08:41Z	NONE	What happened: Trying to load a netCDF file-like (`io.BytesIO` object) with attribute strings in non-utf-8 encoding with the `h5netcdf` engine leads to `UnicodeDecodeError`. What you expected to happen: Loading the same file, albeit persisted to disk, with the `netcdf4` engine works fine, however, since the `netcdf4` engine doesnt support the file-like objects I ran into this issue. Traceback: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 242, in load_dataset with open_dataset(filename_or_obj, *kwargs) as ds: File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 496, in open_dataset backend_ds = backend.open_dataset( File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 384, in open_dataset ds = store_entrypoint.open_dataset( File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py", line 22, in open_dataset vars, attrs = store.load() File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py", line 126, in load attributes = FrozenDict(self.get_attrs()) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 234, in get_attrs return FrozenDict(read_attributes(self.ds)) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 75, in read_attributes v = maybe_decode_bytes(v) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 63, in maybe_decode_bytes return txt.decode("utf-8") Minimal Complete Verifiable Example: ```python import xarray as xr import netCDF4 title = b'\xc3' f = netCDF4.Dataset('test.nc', 'w') f.title = title f.close() xr.load_dataset("test.nc", engine="h5netcdf") ``` Environment*: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.0 (default, Feb 25 2021, 22:10:10) [GCC 8.4.0] python-bits: 64 OS: Linux OS-release: 4.15.0-136-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.1 pandas: 1.2.4 numpy: 1.20.3 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.0.0 pip: 21.1.3 conda: None pytest: 6.2.4 IPython: 7.25.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5563/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2117248281	I_kwDOAMm_X85-MqUZ	8704	Currently no way to create a Coordinates object without indexes for 1D variables	TomNicholas 35968931	closed	0	4	2024-02-04T18:30:18Z	2024-03-26T13:50:16Z	2024-03-26T13:50:15Z	MEMBER	What happened? The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on `main`, meaning that I think there is currently no way to create an `xr.Coordinates` object without 1D variables being coerced to indexes. This means there is no way to create a `Dataset` object without 1D variables becoming `IndexVariables` being coerced to indexes. What did you expect to happen? I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e. `python xr.Coordinates({'x': ('x', uarr)}, indexes={})` where `uarr` is an un-indexable array-like. Minimal Complete Verifiable Example ```Python class UnindexableArrayAPI: ... class UnindexableArray: """ Presents like an N-dimensional array but doesn't support changes of any kind, nor can it be coerced into a np.ndarray or pd.Index. """ _shape: tuple[int, ...] _dtype: np.dtype def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None: self._shape = shape self._dtype = dtype self.__array_namespace__ = UnindexableArrayAPI @property def dtype(self) -> np.dtype: return self._dtype @property def shape(self) -> tuple[int, ...]: return self._shape @property def ndim(self) -> int: return len(self.shape) @property def size(self) -> int: return np.prod(self.shape) @property def T(self) -> Self: raise NotImplementedError() def __repr__(self) -> str: return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})" def _repr_inline_(self, max_width): """ Format to a single line with at most max_width characters. Used by xarray. """ return self.__repr__() def __getitem__(self, key, /) -> Self: """ Only supports extremely limited indexing. I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway. """ from xarray.core.indexing import BasicIndexer if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim): # no-op return self else: raise NotImplementedError() def __array__(self) -> np.ndarray: raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects") ``` ```python uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32')) xr.Variable(data=uarr, dims=['x']) # works fine xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0 but in versions after that it triggers the NotImplementedError in `__array__`:python NotImplementedError Traceback (most recent call last) Cell In[59], line 1 ----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={}) File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.init(self, coords, indexes) 299 variables = {} 300 for name, data in coords.items(): --> 301 var = as_variable(data, name=name) 302 if var.dims == (name,) and indexes is None: 303 index, index_vars = create_default_index_implicit(var, list(coords)) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name) 152 raise TypeError( 153 f"Variable {name!r}: unable to convert object into a variable without an " 154 f"explicit list of dimensions: {obj!r}" 155 ) 157 if name is not None and name in obj.dims and obj.ndim == 1: 158 # automatically convert the Variable into an Index --> 159 obj = obj.to_index_variable() 161 return obj File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self) 570 def to_index_variable(self) -> IndexVariable: 571 """Return this variable as an xarray.IndexVariable""" --> 572 return IndexVariable( 573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True 574 ) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.init(self, dims, data, attrs, encoding, fastpath) 2640 # Unlike in Variable, always eagerly load values into memory 2641 if not isinstance(self._data, PandasIndexingAdapter): -> 2642 self._data = PandasIndexingAdapter(self._data) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.init(self, array, dtype) 1478 def init(self, array: pd.Index, dtype: DTypeLike = None): 1479 from xarray.core.indexes import safe_cast_to_index -> 1481 self.array = safe_cast_to_index(array) 1483 if dtype is None: 1484 self._dtype = get_valid_numpy_dtype(array) File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array) 459 emit_user_level_warning( 460 ( 461 "`pandas.Index` does not support the `float16` dtype." (...) 465 category=DeprecationWarning, 466 ) 467 kwargs["dtype"] = "float64" --> 469 index = pd.Index(np.asarray(array), kwargs) 471 return _maybe_cast_to_cftimeindex(index) Cell In[55], line 63, in UnindexableArray.array(self) 62 def array*(self) -> np.ndarray: ---> 63 raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects") NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects ``` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response* Anything else we need to know? Context is #8699 Environment Versions described above	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8704/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
957918751	MDU6SXNzdWU5NTc5MTg3NTE=	5664	Interpolation behaviour inconsistent with numpy?	mathisc 7017525	open	0	4	2021-08-02T08:56:28Z	2024-03-12T01:15:46Z		NONE	Hey all, When running `dataset.interp(time=dataset.time)` fills with `np.nan` if one of the neighbor is a `np.nan` even when interpolation is not actually needed. Here is the sample code to reproduce the issue : ```python def test_crop_times_nan() : ds = xr.Dataset( data_vars = { "some_variable" : (['x', 'time'], np.array([[np.nan, 0, 1]])) }, coords = { "time" : np.array([0,1,2]) } ) result = ds.interp(time=ds.time) `# result["some_variable"].value == [nan, nan, 1.0] # whereas [nan, 0, 1.0] is EXPECTED xr.testing.assert_allclose(ds, result)` `Please note that numpy does not have the same behavior :`python import numpy as np np.interp([0,1,2], xp=[0,1,2], fp=[np.nan,0,1]) array([nan, 0., 1.]) ``` Is that an intended behaviour for xarray? If so, does this mean that I first have to check if an interpolation is needed instead of doing it no matter what (and use `reindex` instead of `interp` if it is not needed) ? (this will be kind of tricky if interpolation is needed for certain values and some not...) Thanks for your help ;) Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-7642-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.19.4 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: None cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.0 distributed: 2021.01.0 matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: None IPython: 7.19.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5664/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2140090923	I_kwDOAMm_X85_jzIr	8759	Passing datasets with different group hierarchy to open_mfdataset	KareemShalabi 111437410	closed	0	4	2024-02-17T13:31:18Z	2024-03-03T18:43:09Z	2024-03-03T10:53:34Z	NONE	Is your feature request related to a problem? When you want to open multiple datasets located at different nodes of group hierarchy in HDF file, you can't pass a list of group keys ( save_mfdataset offers 'groups' keyword; emphasis on the s). Add to that, the 'files' keyword argument does not accept 'datastore' as a valid input. Describe the solution you'd like No response Describe alternatives you've considered One, of course, can open_dataset each one in a loop and combine afterwards. One possible fix is to Modify the 'group' argument to accept a list the same length as paths list. Another could be changing "paths" keyword to accept datastore or h5py objects. Both are trivial in my opinion. Most of the code is already there in other functions (open_dataset, save_mfdataset). Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8759/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2141899767	I_kwDOAMm_X85_qsv3	8769	Errors started appearing after release v2024.02.0	navidcy 7112768	closed	0	4	2024-02-19T09:23:16Z	2024-02-22T04:54:06Z	2024-02-22T04:54:06Z	NONE	What happened? I started seeing errors in my CI after latest xarray release. See, e.g., https://github.com/COSIMA/regional-mom6/actions/runs/7957078139/job/21719091616#step:7:226 After I added a compat for xarray to preclude the latest release the error went away. See: https://github.com/COSIMA/regional-mom6/actions/runs/7957192738 What did you expect to happen? No response Minimal Complete Verifiable Example No response MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8769/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2142982259	I_kwDOAMm_X85_u1Bz	8771	Unable to use Xarray to work on RCM Dataset with xsar and safe_rcm by umr-lops	sparshgarg23 34626942	closed	0	4	2024-02-19T18:58:50Z	2024-02-20T05:29:33Z	2024-02-20T05:29:33Z	NONE	What happened? UMR-LOPS has introduced XSAR a library to work with RCM dataset. when working with the following code `import xsar import geoviews as gv import holoviews as hv import geoviews.feature as gf hv.extension('bokeh') path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010') meta = xsar.RcmMeta(name=path) meta.dt` I am encountering the following error ``` ValueError Traceback (most recent call last) <ipython-input-5-3d49b63ff406> in <cell line: 2>() 1 #rs2meta = xsar.RadarSat2Meta(name=path) ----> 2 meta = xsar.RcmMeta(name=path) 14 frames /usr/local/lib/python3.10/dist-packages/xsar/utils.py in wrapper(args, kwargs) 93 startrss = process.memory_info().rss 94 starttime = time.time() ---> 95 result = f(args, kwargs) 96 endtime = time.time() 97 if mem_monitor: /usr/local/lib/python3.10/dist-packages/xsar/rcm_meta.py in init(self, name) 32 self.dt = api.open_rcm(name.split(':')[1]) 33 else: ---> 34 self.dt = api.open_rcm(name) 35 if not name.startswith('RCM_DS:'): 36 name = 'RCM_DS:%s:' % name /usr/local/lib/python3.10/dist-packages/safe_rcm/api.py in open_rcm(url, backend_kwargs, manifest_ignores, dataset_kwargs) 95 ) 96 ---> 97 tree = read_product(mapper, "metadata/product.xml") 98 99 calibration_root = "metadata/calibration" /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in read_product(mapper, product_path) 272 } 273 --> 274 converted = valmap( 275 lambda x: execute(x)(decoded), 276 layout, /usr/local/lib/python3.10/dist-packages/toolz/dicttoolz.py in valmap(func, d, factory) 83 """ 84 rv = factory() ---> 85 rv.update(zip(d.keys(), map(func, d.values()))) 86 return rv 87 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in <lambda>(x) 273 274 converted = valmap( --> 275 lambda x: execute(x)(decoded), 276 layout, 277 ) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call(self, args, kwargs) 302 def call(self, args, *kwargs): 303 try: --> 304 return self._partial(args, kwargs) 305 except TypeError as exc: 306 if self._should_curry(args, kwargs, exc): /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in execute(mapping, f, path) 29 subset = query(path, mapping) 30 ---> 31 return compose_left(f, attach_path(path=path))(subset) 32 33 /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call*(self, args,* kwargs) 485 486 def call(self, args, *kwargs): --> 487 ret = self.first(args, kwargs) 488 for f in self.funcs: 489 ret = f(ret) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call*(self, args,* kwargs) 487 ret = self.first(args, *kwargs) 488 for f in self.funcs: --> 489 ret = f(ret) 490 return ret 491 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in <lambda>(obj) 126 ), 127 lambda obj: obj.set_index({"stacked": ["pole", "pulse"]}), --> 128 lambda obj: obj.unstack("stacked"), 129 ), 130 }, /usr/local/lib/python3.10/dist-packages/xarray/util/deprecation_helpers.py in inner(args,* kwargs) 113 return func(args[:-n_extra_args], *kwargs) 114 --> 115 return func(args,** kwargs) 116 117 return inner /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse) 5576 ) 5577 else: -> 5578 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) 5579 return result 5580 /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in _unstack_once(self, dim, index_and_vars, fill_value, sparse) 5395 indexes = {k: v for k, v in self._indexes.items() if k != dim} 5396 -> 5397 new_indexes, clean_index = index.unstack() 5398 indexes.update(new_indexes) 5399 /usr/local/lib/python3.10/dist-packages/xarray/core/indexes.py in unstack(self) 1019 1020 if not clean_index.is_unique: -> 1021 raise ValueError( 1022 "Cannot unstack MultiIndex containing duplicates. Make sure entries " 1023 f"are unique, e.g., by calling `.drop_duplicates('{self.dim}')`, " ValueError: Cannot unstack MultiIndex containing duplicates. Make sure entries are unique, e.g., by calling `.drop_duplicates('stacked')`, before unstacking. ``` As you can see from the last sections in the trace,the issue is with xarray/dataset.py when we unstack the dataframe. Any ideas why this is happening.The issue doesn't occur with radarsat 2 or any other dataset.So is this an xarray problem or should I raise the issue at umr-lops? What did you expect to happen? the error shouldn't be there,and I should be able to view the dataframe. as shown in below link https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/examples/rcm.html Minimal Complete Verifiable Example `Python import xsar import geoviews as gv import holoviews as hv import geoviews.feature as gf hv.extension('bokeh') path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010') meta = xsar.RcmMeta(name=path) meta.dt` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment commit: None python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.1.58+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.25.2 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: 1.3.0 h5py: 3.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: 3.7.1 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.2 pip: 23.1.2 conda: None pytest: 7.4.4 mypy: None IPython: 7.34.0 sphinx: 5.0.2 /usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8771/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1912094632	I_kwDOAMm_X85x-D-o	8231	xr.concat concatenates along dimensions that it wasn't asked to	TomNicholas 35968931	open	0	4	2023-09-25T18:50:29Z	2024-02-14T20:30:26Z		MEMBER	What happened? Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists). ```python import xarray as xr ds1 = xr.Dataset( coords={ 'x_center': ('x_center', [1, 2, 3]), 'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( coords={ 'x_center': ('x_center', [4, 5, 6]), 'x_outer': ('x_outer', [4.5, 5.5, 6.5]), }, ) ``` Calling `xr.concat` on these with `dim='x_center'` happily concatenates them `python xr.concat([ds1, ds2], dim='x_center')` `<xarray.Dataset> Dimensions: (x_outer: 7, x_center: 6) Coordinates: * x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 * x_center (x_center) int64 1 2 3 4 5 6 Data variables: empty` but notice that the returned result has been concatenated along both `x_center` and `x_outer`. What did you expect to happen? I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. `x_outer`). What I expected to happen was that (as by default `coords='different'`) both variables would be attempted to be concatenated along the `x_center` dimension, which would have succeeded for the `x_center` variable but failed for the `x_outer` variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens: ```python import xarray as xr ds1 = xr.Dataset( data_vars={ 'a': ('x_center', [1, 2, 3]), 'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( data_vars={ 'a': ('x_center', [4, 5, 6]), 'b': ('x_outer', [4.5, 5.5, 6.5]), }, ) python xr.concat([ds1, ds2], dim='x_center', data_vars='different') ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4} ``` Minimal Complete Verifiable Example No response MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? I was trying to create an example for which you would need the automatic combined concat/merge that happens within `xr.combine_by_coords`. Environment xarray `2023.8.0`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8231/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1390228572	I_kwDOAMm_X85S3TRc	7104	Duplicate values on unstack	znichollscr 114576287	closed	0	4	2022-09-29T04:16:26Z	2024-02-13T09:48:37Z	2024-02-13T09:48:37Z	NONE	What happened? I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed. What did you expect to happen? A warning or error would be raised to say, "this isn't going to work". Minimal Complete Verifiable Example ```Python import datetime as dt import xarray as xr ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], dims=("lat", "time"), coords={"lat": [-60, 60], "time": [dt.datetime(2010, 1, d) for d in range(1, 4)]}, name="test", ).to_dataset() ds = ( ds.assign_coords( { "month": ds["time"].dt.month, "year": ds["time"].dt.year, } ) .set_index(time=["month", "year"]) ) ds = ds.unstack("time") the output only has 2 values, which isn't what I expected ds["test"].data ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that... Environment INSTALLED VERSIONS ------------------ commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7 python: 3.8.13 \| packaged by conda-forge \| (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 0.1.dev4312+ge678a1d.d20220928 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 rasterio: 1.3.1 cfgrib: 0.9.10.1 iris: 3.3.0 bottleneck: 1.3.5 dask: 2022.9.1 distributed: 2022.9.1 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: 0.12.0 numbagg: 0.2.1 fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 65.4.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7104/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2126375172	I_kwDOAMm_X85-vekE	8726	PRs requiring approval & merging main?	max-sixty 5635139	closed	0	4	2024-02-09T02:35:58Z	2024-02-09T18:23:52Z	2024-02-09T18:21:59Z	MEMBER	What is your issue? Sorry I haven't been on the calls at all recently (unfortunately the schedule is difficult for me). Maybe this was discussed there? PRs now seem to require a separate approval prior to merging. Is there an upside to this? Is there any difference between those who can approve and those who can merge? Otherwise it just seems like more clicking. PRs also now seem to require merging the latest main prior to merging? I get there's some theoretical value to this, because changes can semantically conflict with each other. But it's extremely rare that this actually happens (can we point to cases?), and it limits the immediacy & throughput of PRs. If the bad outcome does ever happen, we find out quickly when main tests fail and can revert. (fwiw I wrote a few principles around this down a while ago here; those are much stronger than what I'm suggesting in this issue though)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8726/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2115049090	I_kwDOAMm_X85-ERaC	8694	Error while saving an altered dataset to NetCDF when loaded from a file	tarik 12544636	open	0	4	2024-02-02T14:18:03Z	2024-02-07T13:38:40Z		NONE	What happened? When attempting to save an altered Xarray dataset to a NetCDF file using the `to_netcdf` method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file. What did you expect to happen? The altered Xarray dataset is saved as a NetCDF file using the `to_netcdf` method. Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset( data_vars=dict( win_1=("attempt", [True, False, True, False, False, True]), win_2=("attempt", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=("attempt", ["paper", "paper", "scissors", "scissors", "paper", "paper"]), player_2=("attempt", ["rock", "scissors", "paper", "rock", "paper", "rock"]), ) ) ds.to_netcdf("dataset.nc") ds_from_file = xr.load_dataset("dataset.nc") ds_altered = ds_from_file.where(ds_from_file["player_1"] == "paper", drop=True) ds_altered.to_netcdf("dataset_altered.nc") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output Python Traceback (most recent call last): File "example.py", line 20, in <module> ds_altered.to_netcdf("dataset_altered.nc") File ".../python3.9/site-packages/xarray/core/dataset.py", line 2303, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File ".../python3.9/site-packages/xarray/backends/api.py", line 1315, in to_netcdf dump_to_store( File ".../python3.9/site-packages/xarray/backends/api.py", line 1362, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ".../python3.9/site-packages/xarray/backends/common.py", line 356, in store self.set_variables( File ".../python3.9/site-packages/xarray/backends/common.py", line 398, in set_variables writer.add(source, target) File ".../python3.9/site-packages/xarray/backends/common.py", line 243, in add target[...] = source File ".../python3.9/site-packages/xarray/backends/scipy_.py", line 78, in __setitem__ data[key] = value File ".../python3.9/site-packages/scipy/io/_netcdf.py", line 1019, in __setitem__ self.data[index] = data ValueError: could not broadcast input array from shape (4,5) into shape (4,8) Anything else we need to know? Findings: The issue is related to the encoding information of the dataset becoming invalid after filtering data with the `where` method. The `to_netcdf` method takes the available encoding information instead of considering the actual shape of the data. In the provided examples, the maximum length of strings stored in "player_1" and "player_2" is originally set to 8 characters. However, after filtering with the `where` method, the maximum length of the string becomes 5 in "player_1" and remains 8 in "player_2.". But the encoding information of the variables still shows a length of 8, particularly the attribute `char_dim_name`. Workaround: A workaround to resolve this issue is to call the `drop_encoding` method on the dataset before saving it with `to_netcdf`. This action ensures that the encoding information is not available, and the `to_netcdf` method is forced to take the actual shapes of the data, preventing the broadcasting error. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.14 (main, Aug 24 2023, 14:01:46) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.3.1-060301-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8694/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
782440858	MDU6SXNzdWU3ODI0NDA4NTg=	4784	Opening a tiff with scale_factor/add_offset attrs then saving as zarr and opening causes a UFuncTypeError	ohiat 53100696	closed	0	4	2021-01-08T22:45:21Z	2024-02-06T10:40:15Z	2024-02-06T10:40:14Z	NONE	What happened: When opening a geotiff that has `scale_factor` and `add_offset` metadata and then saving it as a zarr the `scale_factor` and `add_offset` attributes are loaded and then saved as strings. When the resulting zarr is opened xarray attempts to apply the `scale_factor` and `add_offset` attributes, but raises an exception because they are of type `<U32`. ``` /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: --> 220 data = scale_factor 221 if add_offset is not None: 222 data += add_offset UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('<U32') to dtype('float32') with casting rule 'same_kind' `` What you expected to happen: 1.scale_factor`and`add_offset`are converted to floats and applied when the tiff is opened 2. When attempting to apply`scale_factor`and`add_offset` attributes, check their types and/or cast them to floats. Minimal Complete Verifiable Example: `python import xarray as xr img = xr.open_rasterio('https://hlssa.blob.core.windows.net/hls/S30/HLS.S30.T10TET.2019001.v1.4_04.tif') img.to_dataset(name='img', promote_attrs=True).to_zarr('./test.zarr', mode='w') xr.open_zarr('./test.zarr').persist()` Anything else we need to know?: Environment*: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 \| packaged by conda-forge \| (default, Dec 26 2020, 05:05:16) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1034-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.0 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.6.1 cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.8 cfgrib: None iris: None bottleneck: None dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.3 conda: None pytest: 6.2.1 IPython: 7.19.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4784/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2112742578	I_kwDOAMm_X8597eSy	8693	reading netcdf with engine=scipy fails with a typeerror under certain conditions	eivindjahren 32731672	open	0	4	2024-02-01T15:03:23Z	2024-02-05T09:35:51Z		CONTRIBUTOR	What happened? Saving and loading from netcdf with engine=scipy produces an unexpected valueerror on read. The file seems to be corrupted. What did you expect to happen? reading works just fine. Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr ds = xr.Dataset( { "values": ( ["name", "time"], np.array([[]], dtype=np.float32).T, ) }, coords={"time": [1], "name": []}, ).expand_dims({"index": [0]}) ds.to_netcdf("file.nc", engine="scipy") _ = xr.open_dataset("file.nc", engine="scipy") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python KeyError Traceback (most recent call last) File .../python3.11/site-packages/xarray/backends/file_manag er.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError: File .../python3.11/site-packages/xarray/backends/lru_cache. py:56, in LRUCache.getitem(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [<function _open_scipy_netcdf at 0x7fe96afa9120>, ('/home/eivind/Projects/ert/file.nc',), 'r', (('mmap', None), ('version', 2)), '264ec6b3-78b3-4766-bb41-7656d6a51962'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[1], line 18 4 ds = ( 5 xr.Dataset( 6 { (...) 15 .expand_dims({"index": [0]}) 16 ) 17 ds.to_netcdf("file.nc", engine="scipy") ---> 18 _ = xr.open_dataset("file.nc", engine="scipy") File .../python3.11/site-packages/xarray/backends/api.py:572 , in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, d ecode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked _array_type, from_array_kwargs, backend_kwargs, kwargs) 560 decoders = _resolve_decoders_kwargs( 561 decode_cf, 562 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 568 decode_coords=decode_coords, 569 ) 571 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 572 backend_ds = backend.open_dataset( 573 filename_or_obj, 574 drop_variables=drop_variables, 575 decoders, 576 kwargs, 577 ) 578 ds = _dataset_from_backend_dataset( 579 backend_ds, 580 filename_or_obj, (...) 590 kwargs, 591 ) 592 return ds File .../python3.11/site-packages/xarray/backends/scipy_.py: 315, in ScipyBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, con cat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, mode, format, group, mm ap, lock) 313 store_entrypoint = StoreBackendEntrypoint() 314 with close_on_error(store): --> 315 ds = store_entrypoint.open_dataset( 316 store, 317 mask_and_scale=mask_and_scale, 318 decode_times=decode_times, 319 concat_characters=concat_characters, 320 decode_coords=decode_coords, 321 drop_variables=drop_variables, 322 use_cftime=use_cftime, 323 decode_timedelta=decode_timedelta, 324 ) 325 return ds File .../python3.11/site-packages/xarray/backends/store.py:4 3, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, conca t_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 29 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting *kwargs 30 self, 31 filename_or_obj: str \| os.PathLike[Any] \| BufferedIOBase \| AbstractDataStore, (...) 39 decode_timedelta=None, 40 ) -> Dataset: 41 assert isinstance(filename_or_obj, AbstractDataStore) ---> 43 vars, attrs = filename_or_obj.load() 44 encoding = filename_or_obj.get_encoding() 46 vars, attrs, coord_names = conventions.decode_cf_variables( 47 vars, 48 attrs, (...) 55 decode_timedelta=decode_timedelta, 56 ) File .../python3.11/site-packages/xarray/backends/common.py: 210, in AbstractDataStore.load(self) 188 def load(self): 189 """ 190 This loads the variables and attributes simultaneously. 191 A centralized loading function makes it easier to create (...) 207 are requested, so care should be taken to make sure its fast. 208 """ 209 variables = FrozenDict( --> 210 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 211 ) 212 attributes = FrozenDict(self.get_attrs()) 213 return variables, attributes File .../python3.11/site-packages/xarray/backends/scipy_.py: 181, in ScipyDataStore.get_variables(self) 179 def get_variables(self): 180 return FrozenDict( --> 181 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 182 ) File .../python3.11/site-packages/xarray/backends/scipy_.py: 170, in ScipyDataStore.ds(self) 168 @property 169 def ds(self): --> 170 return self._manager.acquire() File .../python3.11/site-packages/xarray/backends/file_manag er.py:193, in CachingFileManager.acquire(self, needs_lock) 178 def acquire(self, needs_lock=True): 179 """Acquire a file object from the manager. 180 181 A new file is only opened if it has expired from the (...) 191 An open file object, as returned by `opener(args, *kwargs)`. 192 """ --> 193 file, _ = self._acquire_with_cache_info(needs_lock) 194 return file File .../python3.11/site-packages/xarray/backends/file_manag er.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args,* kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File .../python3.11/site-packages/xarray/backends/scipy_.py: 109, in _open_scipy_netcdf(filename, mode, mmap, version) 106 filename = io.BytesIO(filename) 108 try: --> 109 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 110 except TypeError as e: # netcdf3 message is obscure in this case 111 errmsg = e.args[0] File .../python3.11/site-packages/scipy/io/_netcdf.py:278, i n netcdf_file.init(self, filename, mode, mmap, version, maskandscale) 275 self._attributes = {} 277 if mode in 'ra': --> 278 self._read() File .../python3.11/site-packages/scipy/io/_netcdf.py:607, i n netcdf_file._read(self) 605 self._read_dim_array() 606 self._read_gatt_array() --> 607 self._read_var_array() File .../python3.11/site-packages/scipy/io/netcdf.py:688, i n netcdf_file._read_var_array(self) 685 data = None 686 else: # not a record variable 687 # Calculate size to avoid problems with vsize (above) --> 688 a_size = reduce(mul, shape, 1) size 689 if self.use_mmap: 690 data = self._mm_buf[begin:begin_+a_size].view(dtype=dtype_) TypeError: unsupported operand type(s) for : 'int' and 'NoneType' ``` Anything else we need to know? No response* Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.1 scipy: 1.11.3 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.3 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: 1.8.0 IPython: 8.17.2 sphinx: 7.2.6	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8693/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2111051033	I_kwDOAMm_X8591BUZ	8691	xarray.open_dataset with chunks={} returns a single chunk and not engine (h5netcdf) preferred chunks	abarciauskas-bgse 15016780	closed	0	4	2024-01-31T22:04:02Z	2024-01-31T22:56:17Z	2024-01-31T22:56:17Z	NONE	What happened? When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine="h5netcdf", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047). A notebook version of the code below includes the output: https://gist.github.com/abarciauskas-bgse/9366e04d2af09b79c9de466f6c1d3b90 What did you expect to happen? I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine. Minimal Complete Verifiable Example ```Python !/usr/bin/env python coding: utf-8 This notebook looks at how xarray and h5netcdf return different chunks. import pandas as pd import h5netcdf import s3fs import xarray as xr dates = [ d.to_pydatetime().strftime('%Y%m%d') for d in pd.date_range('2023-02-01', '2023-03-01', freq='D') ] SHORT_NAME = 'MUR-JPL-L4-GLOB-v4.1' s3_fs = s3fs.S3FileSystem(anon=False) var = 'analysed_sst' def make_filename(time): base_url = f's3://podaac-ops-cumulus-protected/{SHORT_NAME}/' # example file: "/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc" return f'{base_url}{time}090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' s3_urls = [make_filename(d) for d in dates] def print_chunk_shape(s3_url): try: # Open the dataset using xarray file = s3_fs.open(s3_url) dataset = xr.open_dataset(file, engine='h5netcdf', chunks={}) # Print chunk shapes for each variable in the dataset print(f"\nChunk shapes for {s3_url}:") if dataset[var].chunks is not None: print(f"xarray open_dataset chunks for {var}: {dataset[var].chunks}") else: print(f"xarray open_dataset chunks for {var}: Not chunked") with h5netcdf.File(file, 'r') as file: dataset = file[var] # Check if the dataset is chunked if dataset.chunks: print(f"h5netcdf chunks for {var}:", dataset.chunks) else: print(f"h5netcdf dataset is not chunked.") except Exception as e: print(f"Failed to process {s3_url}: {e}") [print_chunk_shape(s3_url) for s3_url in s3_urls] ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 \| packaged by conda-forge \| (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.10.198-187.748.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.1 libnetcdf: 4.9.2 xarray: 2023.6.0 pandas: 2.0.3 numpy: 1.24.4 scipy: 1.11.1 netCDF4: 1.6.4 pydap: installed h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.15.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.6.1 distributed: 2023.6.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.0.0 pip: 23.1.2 conda: None pytest: 7.4.0 mypy: None IPython: 8.14.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8691/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2104267494	I_kwDOAMm_X859bJLm	8677	Add rolling.rank() same as pandas	Mirac-Le 39230130	open	0	4	2024-01-28T17:27:21Z	2024-01-29T19:50:20Z		NONE	Is your feature request related to a problem? Dear xarray maintainers, I would like to express my heartfelt gratitude for the significant optimizations your xarray library has brought to my project. Xarray combines the speed of numpy with the highly customizable parameters of pandas. The extensive parameters in the `rolling` module have allowed me to achieve functionality similar to pandas more efficiently. I am wondering if it would be possible to incorporate a ranking method for rolling windows, including the ability to specify parameters such as `pct`, similar to the pandas `rolling.rank` function. Your consideration of this feature would be greatly appreciated. Once again, thank you for your contributions! Describe the solution you'd like No response Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8677/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1716228662	I_kwDOAMm_X85mS5I2	7848	Compatibility with the Array API standard	TomNicholas 35968931	open	0	4	2023-05-18T20:34:43Z	2024-01-25T04:03:42Z		MEMBER	What is your issue? Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other. We've already had - #6804 - #7067 - #7847 and there will likely be many others. I suspect this might require changes to the standard as well as to xarray - in particular see this list of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ): `np.clip` `np.diff` `np.pad` `np.repeat` ~`np.take`~ ~`np.tile`~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7848/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2079089277	I_kwDOAMm_X8577GJ9	8607	allow computing just a small number of variables	keewis 14808389	open	0	4	2024-01-12T15:21:27Z	2024-01-12T20:20:29Z		MEMBER	Is your feature request related to a problem? I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that. Describe the solution you'd like I'd imagine something like `python ds.compute(variables=variable_names)` but I'm undecided on whether that's a good idea (it might make `.compute` more complex?) Describe alternatives you've considered So far I've been using something like `python ds.assign_coords({k: lambda ds: ds[k].compute() for k in variable_names}) ds.pipe(lambda ds: ds.merge(ds[variable_names].compute()))` but both are not easy to type / understand (though having `.merge` take a callable would make this much easier). Also, the first option computes variables separately, which may not be ideal? Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8607/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2073024461	I_kwDOAMm_X857j9fN	8602	`DataArray.mean()` and `Dataset.mean()` fail with `sparse==0.15.0`	martinkim0 46072231	closed	0	4	2024-01-09T19:27:47Z	2024-01-10T14:44:57Z	2024-01-10T14:44:57Z	NONE	What happened? The following script leads to an error: ``` import numpy as np import xarray as xr from sparse import GCXS x = np.random.negative_binomial(1, 0.5, size=(100, 100)) array = xr.DataArray(GCXS.from_numpy(x)) array.mean() ``` ``` AttributeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 array.mean() File ~/.../python3.11/site-packages/xarray/core/_aggregations.py:1663, in DataArrayAggregations.mean(self, dim, skipna, keep_attrs, kwargs) 1588 def mean( 1589 self, 1590 dim: Dims = None, (...) 1594 kwargs: Any, 1595 ) -> Self: 1596 """ 1597 Reduce this DataArray's data by applying `mean` along some dimension(s). 1598 (...) 1661 array(nan) 1662 """ -> 1663 return self.reduce( 1664 duck_array_ops.mean, 1665 dim=dim, 1666 skipna=skipna, 1667 keep_attrs=keep_attrs, 1668 kwargs, 1669 ) File ~/.../python3.11/site-packages/xarray/core/dataarray.py:3776, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs) 3732 def reduce( 3733 self, 3734 func: Callable[..., Any], (...) 3740 kwargs: Any, 3741 ) -> Self: 3742 """Reduce this array by applying `func` along some dimension(s). 3743 3744 Parameters (...) 3773 summarized data and the indicated dimension(s) removed. 3774 """ -> 3776 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, kwargs) 3777 return self._replace_maybe_drop_dims(var) File ~/.../python3.11/site-packages/xarray/core/variable.py:1756, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs) 1749 keep_attrs_ = ( 1750 _get_keep_attrs(default=False) if keep_attrs is None else keep_attrs 1751 ) 1753 # Noe that the call order for Variable.mean is 1754 # Variable.mean -> NamedArray.mean -> Variable.reduce 1755 # -> NamedArray.reduce -> 1756 result = super().reduce( 1757 func=func, dim=dim, axis=axis, keepdims=keepdims, kwargs 1758 ) 1760 # return Variable always to support IndexVariable 1761 return Variable( 1762 result.dims, result.data, attrs=result._attrs if keep_attrs else None 1763 ) File ~/.../python3.11/site-packages/xarray/namedarray/core.py:772, in NamedArray.reduce(self, func, dim, axis, keepdims, kwargs) 770 data = func(self.data, axis=axis, kwargs) 771 else: --> 772 data = func(self.data, kwargs) 774 if getattr(data, "shape", ()) == self.shape: 775 dims = self.dims File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:637, in mean(array, axis, skipna, kwargs) 635 return _to_pytimedelta(mean_timedeltas, unit="us") + offset 636 else: --> 637 return _mean(array, axis=axis, skipna=skipna, kwargs) File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:399, in _create_nan_agg_method.<locals>.f(values, axis, skipna, kwargs) 396 kwargs.pop("min_count", None) 398 xp = get_array_namespace(values) --> 399 func = getattr(xp, name) 401 try: 402 with warnings.catch_warnings(): AttributeError: module 'sparse' has no attribute 'mean' ``` What did you expect to happen? Reproducible script runs without error with `sparse==0.14.0`. Minimal Complete Verifiable Example No response MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 \| packaged by conda-forge \| (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.2.0-34-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: None xarray: 2023.12.0 pandas: 1.5.3 numpy: 1.24.4 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2023.12.0 distributed: 2023.12.0 matplotlib: 3.8.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.12.0 cupy: None pint: None sparse: 0.15.0 flox: None numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.18.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8602/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2041076267	I_kwDOAMm_X855qFor	8551	Make _obj_repr public	BENR0 12115839	closed	0	4	2023-12-14T07:19:16Z	2023-12-21T16:00:52Z	2023-12-21T16:00:52Z	NONE	What is your issue? We are using https://github.com/pydata/xarray/blob/2971994ef1dd67f44fe59e846c62b47e1e5b240b/xarray/core/formatting_html.py#L278 in the html representation of `AreaDefinitions` in https://github.com/pytroll/pyresample and don't like to import private functions. Would it be OK to make `_obj_repr` public?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8551/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2027147099	I_kwDOAMm_X854089b	8523	tree-reduce the combine for `open_mfdataset(..., parallel=True, combine="nested")`	dcherian 2448579	open	0	4	2023-12-05T21:24:51Z	2023-12-18T19:32:39Z		MEMBER	Is your feature request related to a problem? When `parallel=True` and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the "head node" for the combine. Instead we can tree-reduce the combine (example) by switching to `dask.bag` instead of `dask.delayed` and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node. The downside is the dask graph is "worse" but perhaps that shouldn't stop us. I think this is only feasible for `combine="nested"` cc @TomNicholas	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8523/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1223031600	I_kwDOAMm_X85I5fsw	6561	Excessive memory consumption by to_dataframe()	sgdecker 8419421	closed	0	4	2022-05-02T15:33:33Z	2023-12-15T20:47:32Z	2023-12-15T20:47:32Z	NONE	What happened? This is a reincarnation of #2534 with a reproduceable example. A 51 MB netCDF file leads to to_dataframe() requesting 23 GB. What did you expect to happen? I expect to_dataframe() to require much less than 23 GB of memory for this operation. Minimal Complete Verifiable Example ```Python import urllib.request import xarray as xr url = 'http://people.envsci.rutgers.edu/decker/Surface_METAR_20220501_0000.nc' fname = 'metar.nc' urllib.request.urlretrieve(url, filename=fname) ncdata = xr.open_dataset(fname) df = ncdata.to_dataframe() ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python Traceback (most recent call last): File "/chariton/decker/test/bug/xarraymem.py", line 8, in <module> df = ncdata.to_dataframe() File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5399, in to_dataframe return self._to_dataframe(ordered_dims=ordered_dims) File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5363, in _to_dataframe data = [ File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5364, in <listcomp> self._variables[k].set_dims(ordered_dims).values.reshape(-1) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 23.3 GiB for an array with shape (5021, 127626) and data type \|S39 Anything else we need to know? No response Environment /home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 \| packaged by conda-forge \| (main, Mar 24 2022, 17:39:04) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.62.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6561/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
384002323	MDU6SXNzdWUzODQwMDIzMjM=	2570	np.clip() executes eagerly	Hoeze 1200058	closed	0	4	2018-11-24T16:25:03Z	2023-12-03T05:29:17Z	2023-12-03T05:29:17Z	NONE	Example: `python x = xr.DataArray(np.random.uniform(size=[100, 100])).chunk(10) x` <xarray.DataArray (dim_0: 100, dim_1: 100)> dask.array<shape=(100, 100), dtype=float64, chunksize=(10, 10)> Dimensions without coordinates: dim_0, dim_1 `python np.clip(x, 0, 0.5)` <xarray.DataArray (dim_0: 100, dim_1: 100)> array([[0.264276, 0.32227 , 0.336396, ..., 0.110182, 0.28255 , 0.399041], [0.5 , 0.030289, 0.5 , ..., 0.428923, 0.262249, 0.5 ], [0.5 , 0.5 , 0.280971, ..., 0.427334, 0.026649, 0.5 ], ..., [0.5 , 0.5 , 0.294943, ..., 0.053143, 0.5 , 0.488239], [0.5 , 0.341485, 0.5 , ..., 0.5 , 0.250441, 0.5 ], [0.5 , 0.156285, 0.179123, ..., 0.5 , 0.076242, 0.319699]]) Dimensions without coordinates: dim_0, dim_1 `python x.clip(0, 0.5)` <xarray.DataArray (dim_0: 100, dim_1: 100)> dask.array<shape=(100, 100), dtype=float64, chunksize=(10, 10)> Dimensions without coordinates: dim_0, dim_1 Problem description Using np.clip() directly calculates the result, while xr.DataArray.clip() does not.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2570/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1902108672	I_kwDOAMm_X85xX-AA	8207	Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset`	kasra-keshavarz 50383939	open	0	4	2023-09-19T02:44:02Z	2023-12-01T22:29:49Z		NONE	What is your issue? I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow: ```python-console In [1]: import os; import dask In [2]: import xarray as xr In [3]: from dask.distributed import Client, LocalCluster In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker In [5]: client = Client(cluster) In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE' In [7]: ds = xr.open_mfdataset('./remapped/.nc', chunks={'COMID': 1400}, parallel=True) In [8]: ds.to_netcdf('./out2.nc') ``` And below, is the error I am getting: Error message ```python-console In [8]: ds.to_netcdf('./out2.nc') /home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. warnings.warn( 2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0) Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x2b863b0e94c0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x2b86218d4ee0>, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None))) kwargs: {} Exception: "RuntimeError('NetCDF: HDF error')" --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 ds.to_netcdf('./out2.nc') File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2249 encoding = {} 2250 from xarray.backends.api import to_netcdf -> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2253 self, 2254 path, 2255 mode=mode, 2256 format=format, 2257 group=group, 2258 engine=engine, 2259 encoding=encoding, 2260 unlimited_dims=unlimited_dims, 2261 compute=compute, 2262 multifile=False, 2263 invalid_netcdf=invalid_netcdf, 2264 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1252 if multifile: 1253 return writer, store -> 1255 writes = writer.sync(compute=compute) 1257 if isinstance(target, BytesIO): 1258 store.sync() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs) 253 if chunkmanager_store_kwargs is None: 254 chunkmanager_store_kwargs = {} --> 256 delayed_store = chunkmanager.store( 257 self.sources, 258 self.targets, 259 lock=self.lock, 260 compute=compute, 261 flush=True, 262 regions=self.regions, 263 chunkmanager_store_kwargs, 264 ) 265 self.sources = [] 266 self.targets = [] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, kwargs) 203 def store( 204 self, 205 sources: DaskArray \| Sequence[DaskArray], 206 targets: Any, 207 kwargs, 208 ): 209 from dask.array import store --> 211 return store( 212 sources=sources, 213 targets=targets, 214 kwargs, 215 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(failed resolving arguments) 1234 elif compute: 1235 store_dsk = HighLevelGraph(layers, dependencies) -> 1236 compute_as_if_collection(Array, store_dsk, map_keys, kwargs) 1237 return None 1239 else: File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, kwargs) 367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get) 368 dsk2 = optimization_function(cls)(dsk, keys, kwargs) --> 369 return schedule(dsk2, keys, kwargs) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, kwargs) 3265 should_rejoin = False 3266 try: -> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3268 finally: 3269 for f in futures.values(): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous) 2390 local_worker = None 2392 with shorten_traceback(): -> 2393 return self.sync( 2394 self._gather, 2395 futures, 2396 errors=errors, 2397 direct=direct, 2398 local_worker=local_worker, 2399 asynchronous=asynchronous, 2400 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__() 483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray: --> 484 return np.asarray(self.get_duck_array(), dtype=dtype) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array() 486 def get_duck_array(self): --> 487 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array() 663 def get_duck_array(self): --> 664 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array() 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): --> 557 array = array.get_duck_array() 558 return _wrap_numpy_scalars(array) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array() 73 def get_duck_array(self): ---> 74 return self.func(self.array.get_duck_array()) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array() 550 def get_duck_array(self): --> 551 array = self.array[self.key] 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__() 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter() 836 """Support explicit indexing by delegating to a raw indexing method. 837 838 Outer and/or vectorized indexers are supported by indexing a second time (...) 855 Indexing result, in the form of a duck numpy-array. 856 """ 857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 858 result = raw_indexing_method(raw_key.tuple) 859 if numpy_indices.tuple: 860 # index the loaded np.ndarray 861 result = NumpyIndexingAdapter(result)[numpy_indices] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem() 110 try: 111 with self.datastore.lock: --> 112 original_array = self.get_array(needs_lock=False) 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array() 90 def get_array(self, needs_lock=True): ---> 91 ds = self.datastore._acquire(needs_lock) 92 variable = ds.variables[self.variable_name] 93 variable.set_auto_maskandscale(False) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire() 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args, kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars() File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ``` The header of individual NetCDF ones are also in the following: Individual NetCDF header ```console $ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc netcdf ab_models_remapped_1980-04-20-13-00-00 { dimensions: COMID = 14980 ; time = UNLIMITED ; // (24 currently) variables: int time(time) ; time:long_name = "time" ; time:units = "hours since 1980-04-20 12:00:00" ; time:calendar = "gregorian" ; time:standard_name = "time" ; time:axis = "T" ; double latitude(COMID) ; latitude:long_name = "latitude" ; latitude:units = "degrees_north" ; latitude:standard_name = "latitude" ; double longitude(COMID) ; longitude:long_name = "longitude" ; longitude:units = "degrees_east" ; longitude:standard_name = "longitude" ; double COMID(COMID) ; COMID:long_name = "shape ID" ; COMID:units = "1" ; double RDRS_v2.1_P_P0_SFC(time, COMID) ; RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ; RDRS_v2.1_P_P0_SFC:long_name = "Forecast: Surface pressure" ; RDRS_v2.1_P_P0_SFC:units = "mb" ; double RDRS_v2.1_P_HU_1.5m(time, COMID) ; RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_HU_1.5m:long_name = "Forecast: Specific humidity" ; RDRS_v2.1_P_HU_1.5m:units = "kg kg-1" ; double RDRS_v2.1_P_TT_1.5m(time, COMID) ; RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_TT_1.5m:long_name = "Forecast: Air temperature" ; RDRS_v2.1_P_TT_1.5m:units = "deg_C" ; double RDRS_v2.1_P_UVC_10m(time, COMID) ; RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ; RDRS_v2.1_P_UVC_10m:long_name = "Forecast: Wind Modulus (derived using UU and VV)" ; RDRS_v2.1_P_UVC_10m:units = "kts" ; double RDRS_v2.1_A_PR0_SFC(time, COMID) ; RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ; RDRS_v2.1_A_PR0_SFC:long_name = "Analysis: Quantity of precipitation" ; RDRS_v2.1_A_PR0_SFC:units = "m" ; double RDRS_v2.1_P_FB_SFC(time, COMID) ; RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FB_SFC:long_name = "Forecast: Downward solar flux" ; RDRS_v2.1_P_FB_SFC:units = "W m-2" ; double RDRS_v2.1_P_FI_SFC(time, COMID) ; RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FI_SFC:long_name = "Forecast: Surface incoming infrared flux" ; RDRS_v2.1_P_FI_SFC:units = "W m-2" ; ``` I am running `xarray` and `Dask` on an HPC, so the "modules" I have loaded are the following: ```console module list Currently Loaded Modules: 1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math) 2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys) 3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a 4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5 5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1 ``` Any suggestion is greatly appreciated!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8207/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2019789753	I_kwDOAMm_X854Y4u5	8499	'drop_duplicates' behaves differently when using 1 vs many coordinates for an index	jbweston 6654709	open	0	4	2023-12-01T00:36:42Z	2023-12-01T09:55:39Z		NONE	What happened? I am trying to `drop_duplicates` from a DataArray based on the values of some of the coordinates, starting from a DataArray with coordinates, but no indexes. To accomplish this, I call 'DataArray.set_xindex' with the appropriate coordinate names, and then call 'drop_duplicates' on the resulting DataArray, like so: ```python from xarray import DataArray import numpy as np test_array = DataArray( np.random.rand(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", ) output DataArray's 'sample' dimension has length 2, as expected good = test_array.set_xindex(["x", "y"]).drop_duplicates("sample") assert len(good) == 2 ``` The above functions as expected; 'good' has had its duplicates dropped, and we are left with a DataArray of length 2. However, the following does not function as I would expect: ```python All the 'y's are '-1', so we expect the same duplicates as before to be dropped, even if we don't include the 'y' values in the index. bad = test_array.set_xindex("x").drop_duplicates("sample") But this assert fails! 'drop_duplicates' does not drop anything assert not bad.equals(test_array) ``` What did you expect to happen? I expected `drop_duplicates` to drop the duplicates when I was using only a single coordinate for the index. Minimal Complete Verifiable Example ```Python from xarray import DataArray import numpy as np test_array = DataArray( range(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", ) output DataArray's 'sample' dimension has length 2, as expected good = test_array.set_xindex(["x", "y"]).drop_duplicates("sample") And indeed there are only 2 elements left after dropping duplicates. assert len(good) == 2 All the 'y's are '-1', so we expect the same duplicates as before to be dropped, bad = test_array.drop_vars("y").set_xindex("x").drop_duplicates("sample") But this assert fails! 'drop_duplicates' does not drop anything assert not bad.equals(test_array.drop_vars("y")) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 \| packaged by conda-forge \| (main, Aug 27 2023, 03:34:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.11.0 pandas: 2.1.0 numpy: 1.24.4 scipy: 1.11.2 netCDF4: 1.6.3 pydap: None h5netcdf: 1.2.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None iris: None bottleneck: None dask: 2023.9.1 distributed: 2023.9.1 matplotlib: 3.7.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.9.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.1.2 pip: 23.2.1 conda: 23.7.3 pytest: 7.4.2 mypy: None IPython: 8.15.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8499/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1983891070	I_kwDOAMm_X852P8Z-	8427	Ambiguous behavior with coordinates when appending to Zarr store with append_dim	rabernat 1197350	closed	0	4	2023-11-08T15:40:19Z	2023-12-01T03:58:56Z	2023-12-01T03:58:55Z	MEMBER	What happened? There are two quite different scenarios covered by "append" with Zarr Adding new variables to a dataset Extending arrays along a dimensions (via `append_dim`) This issue is about what should happen when using `append_dim` with variables that do not contain `append_dim`. Here's the current behavior. ```python import xarray as xr import zarr ds1 = xr.DataArray( np.array([1, 2, 3]).reshape(3, 1, 1), dims=('time', 'y', 'x'), coords={'x': [1], 'y': [2]}, name="foo" ).to_dataset() ds2 = xr.DataArray( np.array([4, 5]).reshape(2, 1, 1), dims=('time', 'y', 'x'), coords={'x':[-1], 'y': [-2]}, name="foo" ).to_dataset() how concat works: data are aligned ds_concat = xr.concat([ds1, ds2], dim="time") assert ds_concat.dims == {"time": 5, "y": 2, "x": 2} now do a Zarr append store = zarr.storage.MemoryStore() ds1.to_zarr(store, consolidated=False) we do not check that the coordinates are aligned--just that they have the same shape and dtype ds2.to_zarr(store, append_dim="time", consolidated=False) ds_append = xr.open_zarr(store, consolidated=False) coordinates data have been overwritten assert ds_append.dims == {"time": 5, "y": 1, "x": 1} ...with the latest values assert ds_append.x.data[0] == -1 ``` Currently, we always write all data variables in this scenario. That includes overwriting the coordinates every time we append. That makes appending more expensive than it needs to be. I don't think that is the behavior most users want or expect. What did you expect to happen? There are a couple of different options we could consider for how to handle this "extending" situation (with `append_dim`) [current behavior] Do not attempt to align coordinates a. [current behavior] Overwrite coordinates with new data b. Keep original coordinates c. Force the user to explicitly drop the coordinates, as we do for `region` operations. Attempt to align coordinates a. Fail if coordinates don't match b. Extend the arrays to replicate the behavior of `concat` We currently do 1a. I propose to switch to 1b. I think it is closer to what users want, and it requires less I/O. Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 \| packaged by conda-forge \| (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.10.176-157.645.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.2 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.1 distributed: 2023.10.1 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: 0.13.0 numbagg: 0.6.0 fsspec: 2023.10.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8427/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1044693438	I_kwDOAMm_X84-RMG-	5937	DataArray.dt.seconds returns incorrect value for negative `timedelta64[ns]`	leifdenby 2405019	closed	0	4	2021-11-04T12:05:24Z	2023-11-10T00:39:17Z	2023-11-10T00:39:17Z	CONTRIBUTOR	What happened: For a negative `timedelta64[ns]` of 42 nanoseconds `DataArray.dt.seconds` returned a non-zero value (the returned value was `86399`). When I pass in a positive 42 nanosecond `timedelta64[ns]` with the the TimeDeltaAccessor correctly returns zero. I would have expected both assertions in the example below to have passed, but the second fails. This seems to be a general issue with negative `timedelta64[ns]`. `bash <xarray.DataArray 'seconds' (dim_0: 1)> array([0]) Dimensions without coordinates: dim_0 <xarray.DataArray 'seconds' (dim_0: 1)> array([86399]) Dimensions without coordinates: dim_0 Traceback (most recent call last): File "bug_dt_seconds.py", line 15, in <module> assert da.dt.seconds == 0 AssertionError` What you expected to happen: `bash <xarray.DataArray 'seconds' (dim_0: 1)> array([0]) Dimensions without coordinates: dim_0 <xarray.DataArray 'seconds' (dim_0: 1)> array([0]) Dimensions without coordinates: dim_0` Minimal Complete Verifiable Example: ```python coding: utf-8 import xarray as xr import numpy as np number of nanoseconds value = 42 da = xr.DataArray([np.timedelta64(value, "ns")]) print(da.dt.seconds) assert da.dt.seconds == 0 da = xr.DataArray([np.timedelta64(-value, "ns")]) print(da.dt.seconds) assert da.dt.seconds == 0 ``` Anything else we need to know?: I've narrowed this down to the call to `pd.Series(values.ravel())` in `xarray.core.accessor_dt._access_through_series`: `python ipdb> pd.Series(values.ravel()) 0 -1 days +23:59:59.999999958 dtype: timedelta64[ns]` I think the issue arises because pandas turns the numpy timedelta64 into a "minus one day plus a time". This actually does have a number of "seconds" in it, but the "total_seconds" has the expected value: `python ipdb> pd.Series(values.ravel()).dt.total_seconds() 0 -4.200000e-08 dtype: float64` Which would correctly round to zero. I don't think the issue is in pandas, although the output from pandas is counter-intuitive: `python ipdb> pd.Series(values.ravel()).dt.seconds 0 86399 dtype: int64` Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)? Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 6 2020, 04:59:01) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.18.2 pandas: 1.3.4 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.4.2 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.09.1 distributed: 2021.09.1 matplotlib: 3.2.2 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None fsspec: 2021.06.1 cupy: None pint: 0.18 sparse: None setuptools: 46.4.0.post20200518 pip: 21.1.2 conda: None pytest: 6.0.1 IPython: 7.16.1 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5937/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1981799811	I_kwDOAMm_X852H92D	8423	Support remote string paths for `h5netcdf` engine	jrbourbeau 11656932	open	0	4	2023-11-07T16:52:18Z	2023-11-09T07:24:45Z		CONTRIBUTOR	Is your feature request related to a problem? Currently the `h5netcdf` engine supports opening remote files, but only already open file-like objects (e.g. `s3fs.open(...)`), not string paths like `s3://...`. There are situations where I'd like to use string paths instead of open file-like objets Opening files can sometimes be slow (xref https://github.com/fsspec/s3fs/issues/816) When using `parallel=True` for opening lots of files, serializing open file-like objects back and forth from a remote cluster can be slow Some systems (e.g. NASA Earthdata) only hand out credentials that are valid when run in the same region as the data. Being able to use `parallel=True` + `storage_options` would be convenient/performant in that case. Describe the solution you'd like It would be nice if I could do something like the following: python ds = xr.open_mfdataset( files, # A bunch of files like `s3://bucket/file` engine="h5netcdf", ... parallel=True, storage_options={...}, # fsspec-compatible options ) and have my files opened prior to handing off to `h5netcdf`. `storage_options` is already supported for Zarr, so hopefully extending to `h5netcdf` feels natural. Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8423/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1975845455	I_kwDOAMm_X851xQJP	8410	Segmentation fault 139 (SIGSEGV)	lucadix 39524075	closed	0	4	2023-11-03T10:14:03Z	2023-11-06T20:34:46Z	2023-11-06T20:34:45Z	NONE	What happened? While opening a set of netCDF files in a for loop, using xr.open_dataset().load(), I get a segmentation error (nr. 139). Please see code example below: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)).load() [other code working on region_pred...] `The error is shown in Linux/Mac after running my Python code, whereas Windows seems to be masking it. I was able to catch that on Windows by launching my code as:` python3 my_code.py && echo ok \|\| echo KO ``` In this way, KO gets printed and the segmentation fault is now noticeable. I managed to fix the issue by using a second variable (called reg_pred) in addition to region_pred: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)) reg_pred = region_pred.load() [other code working on reg_pred...] ``` What did you expect to happen? I don't know if the issue I described is something that the developers made on purpose. Personally, I think it is an issue and that's why I am reporting it. If it is not an issue, I would like to get a clarification in order to understand what am I missing. Thank you in advance. Minimal Complete Verifiable Example ```Python for region in region_list: with storage_client.open(region, "rb") as f: data = f.read() region_pred = xr.open_dataset(io.BytesIO(data)).load() # some code working on region_pred to compute weather indices... ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: ('Italian_Italy', '1252') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.8.0 pandas: 2.1.0 numpy: 1.26.0 scipy: 1.11.2 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: 2023.9.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.15.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8410/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1977485456	I_kwDOAMm_X8513giQ	8413	Add a perception of a __xarray__ magic method	swamidass 6273919	open	0	4	2023-11-04T19:55:14Z	2023-11-05T18:50:14Z		NONE	Is your feature request related to a problem? I am often moving data from external objects (of all sorts!) into xarray. This is a common use case Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into Describe the solution you'd like So here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best. Magic Methods It would be great to see these magic method signatures become integrated throughout the library: `___xarray__ -> xr.Dataset \| xr.DataArray ___xarray_array__ -> xr.DatArray ___xarray_dataset__ -> xr.Dataset ___xarray_datatree__ -> xr.DataTree # when DataTree is finally integrated into xarray` Conversion Registry And these extension functions to register converters: def register_xarray_converter(class, name: str, func : Callable[[class, ...] \| None) -> xr.Dataset \| xr.DataArray]: ... def register_dataarray_converter(class, name: str, func : Callable[[class, ...] \| None) -> xr.DataArray: ... def register_dataset_converter(class, name: str, func : Callable[[class, ...] \| None) -> xr.Dataset: ... def register_datatree_converter(class, name: str, func : Callable[[class, ...], xr.DataArray] \| None) -> DataTree # when DataTree is finally integrated into xarray ... Registering a converter if if cls implements a corresponding xarray_* method or another converter already registered for cls. Perhaps add an argument that specifies if the converter should or should not be added if their is a clash. Perhaps these functions return the replaced converter so it can be added back in if needed? Ideally, also, "deregister" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed. User API Along with the following new user API functions: `def as_xarray(x, args, kwargs) -> xr.Dataset \| xr.DataArray: ... def as_dataarray(x,args, *kwargs) -> xr.DataArray: ... def as_dataset(x,args, *kwargs) -> xr.DataSet: ... def as_dataset(x,args, *kwargs) -> xr.DataSet: # when DataTree is finally integrated into xarray ...` "as_xarray" returns (in order of precedence: - x unaltered if it is an xarray objects - registered_xarray_converter(x, args,* kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, args, *kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, args,** kwargs) if it is callable and does not throw an exception - x.xarray(args, kwargs), if it exits, is callable, and does not throw an exception - x.xarray_dataset(args, *kwargs), if it exists, is callable, and does not throw an exception - x.xarray_dataarray(args, kwargs), if it exists, is callable, and does not throw an exception - well known aliases of xarray_dataarray*, such as x.to_xarray(args,* kwargs) (see pandas) - [DESIGN DECISION] convert and return tuple[dims, data, [attr, encoding] to DataArray? - [DESIGN DECISION] convert and return tuple encoding of DataSet? - [DESIGN DECISION] return DataArray wrapped duck-typed array in DataArray? The rationale for putting the registered functions first is that this would enable "as_dataarrray" would be slimilar, but it would only call x.xarray_dataarray and well known aliases. "as_dataset" would be slimilar, but it would only call x.xarray_dataset, well known aliases, and perhaps falling back to calling x.xarray_dataarray and converting the return a dataset if it has a name attribute. "as_datatree" would be slimilar, but it would only call x.xarray_datatree, and perhaps falling back to calling x.xarray_dataarray and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray) The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry. Across the Xarray Library Finally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library. Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_ functions. Describe alternatives you've considered This can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases. Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of repr_html to integrate with pandas. The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible: `x = xr.open_dataset("slide.tiff")` But this doesn't work. `t = tiffslide.TiffSlide("slide.tiff") x = xr.open_dataset(t) # won't work x = xr.DataArray(t) # won't work either` This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem. Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8413/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
887711474	MDU6SXNzdWU4ODc3MTE0NzQ=	5290	Inconclusive error messages using to_zarr with regions	niowniow 5802846	closed	0	4	2021-05-11T15:54:39Z	2023-11-05T06:28:39Z	2023-11-05T06:28:39Z	CONTRIBUTOR	What happened: The idea is to use a xarray dataset (stored as dummy zarr file), which is subsequently filled with the `region` argument, as explained in the documentation. Ideally, almost nothing is stored to disk upfront. It seems the current implementation is only designed to either store coordinates for the whole dataset and write them to disk or to write without coordinates. I failed to understand this from the documentation and tried to create a dataset without coordinates and fill it with a dataset subset with coordinates. It gave some inconclusive errors depending on the actual code example (see below). `ValueError: parameter 'value': expected array with shape (0,), got (10,)` or `ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo'` It might also be a bug and it should in fact be possible to add a dataset with coordinates to a dummy dataset without coordinates. Then there seems to be an issue regarding the handling of the variables during storing the region. ... or I might just have done it wrong... and I'm looking forward to suggestions. What you expected to happen: Either an error message telling me that that i should use coordinates during creation of the dummy dataset. Alternatively, if this is a bug and should be possible then it should just work. Minimal Complete Verifiable Example: ```python import dask.array import xarray as xr import numpy as np error = 1 # choose between 0 (no error), 1, 2, 3 dummies = dask.array.zeros(30, chunks=10) chunks in coords are not taken into account while saving!? coord_x = dask.array.zeros(30, chunks=10) # or coord_x = np.zeros((30,)) if error == 0: ds = xr.Dataset({"foo": ("x", dummies)}, coords={"x":coord_x}) else: ds = xr.Dataset({"foo": ("x", dummies)}) print(ds) path = "./tmp/test.zarr" ds.to_zarr(path, mode='w', compute=False, consolidated=True) create a new dataset to be input into a region ds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) if error == 1: ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) elif error == 2: ds.to_zarr(path, region={"x": slice(0, 10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo' elif error == 3: ds.to_zarr(path, region={"x": slice(0, 10)}) ds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) else: ds.to_zarr(path, region={"x": slice(10, 20)}) ds = xr.open_zarr(path) print('reopen',ds['x']) ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 \| packaged by conda-forge \| (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.18.0 pandas: 1.2.3 numpy: 1.19.2 scipy: 1.6.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5290/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
377356113	MDU6SXNzdWUzNzczNTYxMTM=	2542	full_like, ones_like, zeros_like should retain subclasses	gerritholl 500246	closed	0	4	2018-11-05T11:22:49Z	2023-11-05T06:27:31Z	2023-11-05T06:27:31Z	CONTRIBUTOR	Code Sample, ```python Your code here import numpy import xarray class MyDataArray(xarray.DataArray): pass da = MyDataArray(numpy.arange(5)) da2 = xarray.zeros_like(da) print(type(da), type(da2)) ``` Problem description I would expect that `type(da2) is type(da)`, but this is not the case. The type of `da` is always `<class 'xarray.core.dataarray.DataArray'>`. Rather, the output of this script is: `<class '__main__.MyDataArray'> <class 'xarray.core.dataarray.DataArray'>` Expected Output I would hope as an output: `<class '__main__.MyDataArray'> <class '__main__.MyDataArray'>` In principle changing this could break people's code, so if a change is implemented it should probably be through an optional keyword argument to the `full_like`/`ones_like`/`zeros_like` family. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-754.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.2 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 3.0.0 cartopy: 0.16.0 seaborn: 0.9.0 setuptools: 39.2.0 pip: 18.0 conda: None pytest: 3.2.2 IPython: 6.4.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2542/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1966675016	I_kwDOAMm_X851ORRI	8388	Type annotation compatibility with numpy ufuncs	djhoese 1828519	closed	0	4	2023-10-28T17:25:11Z	2023-11-02T12:44:50Z	2023-11-02T12:44:50Z	CONTRIBUTOR	Is your feature request related to a problem? I'd like mypy to understand that xarray DataArrays passed to numpy ufuncs have a return type of xarray DataArray. ```python import xarray as xr import numpy as np def compute_relative_azimuth(sat_azi: xr.DataArray, sun_azi: xr.DataArray) -> xr.DataArray: abs_diff = np.absolute(sun_azi - sat_azi) ssadiff = np.minimum(abs_diff, 360 - abs_diff) return ssadiff ``` `bash $ mypy ./xarray_mypy.py xarray_mypy.py:7: error: Incompatible return value type (got "ndarray[Any, dtype[Any]]", expected "DataArray") [return-value] Found 1 error in 1 file (checked 1 source file)` Describe the solution you'd like I'm not sure if this is possible, if it is something xarray can fix, or something numpy needs to "fix". I'd like the above situation to "just work" without anything more than maybe some extra type-stub package. Describe alternatives you've considered Cast types or other type coercion or tell mypy to ignore the type issues for these numpy call. Additional context https://stackoverflow.com/questions/77369042/typing-when-passing-xarray-dataarray-objects-to-numpy-ufuncs	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8388/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1445905299	I_kwDOAMm_X85WLsOT	7282	groupby and mean on a MultiIndex level raises ValueError	jjpr-mit 25231875	closed	0	4	2022-11-11T19:15:58Z	2023-10-30T09:18:54Z	2023-08-31T03:50:33Z	NONE	What happened? After using `set_index` to create a `MultiIndex`, calling `groupby` on a `MultiIndex` level and then `mean` raises an error. What did you expect to happen? Apply mean to groups, no error. Minimal Complete Verifiable Example `Python d = DataArray( data=[ [0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20] ], coords={ "greek": ("a", ['alpha', 'beta', 'gamma']), "colors": ("a", ['red', 'green', 'blue']), "compass": ("b", ['north', 'south', 'east', 'west', 'northeast', 'southeast', 'southwest']), "integer": ("b", [0, 1, 2, 3, 4, 5, 6]), }, dims=("a", "b") ) d = d.set_index(a=['greek', 'colors'], b=['compass', 'integer']) g = d.groupby('greek') m = g.mean(...)` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.10/site-packages/xarray/core/_aggregations.py", line 5698, in mean return self.reduce( File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1201, in reduce return self.map(reduce_array, shortcut=shortcut) File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1104, in map return self._combine(applied, shortcut=shortcut) File "/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py", line 1136, in _combine index, index_vars = create_default_index_implicit(coord) File "/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py", line 1045, in create_default_index_implicit index = PandasMultiIndex(array, name) File "/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py", line 615, in __init__ raise ValueError( ValueError: conflicting multi-index level name 'greek' with dimension 'greek' Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 5.15.49-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.2.2 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7282/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1953059418	I_kwDOAMm_X850aVJa	8345	`.stack` produces large chunks	yt87 40218891	closed	0	4	2023-10-19T21:09:56Z	2023-10-26T21:20:05Z	2023-10-26T21:20:05Z	NONE	What happened? Xarray `stack` does not chunk along the last coordinate, producing huge chunks, as described in #5754. Dask, seeing code like this: `da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new")` produces warning and suggestion to use context manager: `with dask.config.set({"array.slicing.split_large_chunks": True}): da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new")` This fails with message `IndexError: tuple index out of range`. What did you expect to happen? I expect this to work. #5754 is closed. Minimal Complete Verifiable Example ```Python import dask.array import numpy as np import xarray as xr var = xr.Variable( ("t", "z", "u", "x", "y"), dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)), ) da = xr.DataArray(var) def sum(ds): return ds.sum(dim="u") with dask.config.set({"array.slicing.split_large_chunks": True}): da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") da2 ``` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python IndexError Traceback (most recent call last) Cell In[21], line 5 2 return ds.sum(dim="u") 4 with dask.config.set({"array.slicing.split_large_chunks": True}): ----> 5 da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") 6 da2 File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse) 2795 def unstack( 2796 self, 2797 dim: Dims = None, 2798 fill_value: Any = dtypes.NA, 2799 sparse: bool = False, 2800 ) -> Self: 2801 """ 2802 Unstack existing dimensions corresponding to MultiIndexes into 2803 multiple new dimensions. (...) 2853 DataArray.stack 2854 """ -> 2855 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2856 return self._from_temp_dataset(ds) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse) 5498 for d in dims: 5499 if needs_full_reindex: -> 5500 result = result._unstack_full_reindex( 5501 d, stacked_indexes[d], fill_value, sparse 5502 ) 5503 else: 5504 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5393 if name not in index_vars: 5394 if dim in var.dims: -> 5395 variables[name] = var.unstack({dim: new_dim_sizes}) 5396 else: 5397 variables[name] = var File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, dimensions_kwargs) 1928 result = self 1929 for old_dim, dims in dimensions.items(): -> 1930 result = result._unstack_once_full(dims, old_dim) 1931 return result File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim) 1817 reordered = self.transpose(dim_order) 1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes -> 1820 new_data = reordered.data.reshape(new_shape) 1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names 1823 return type(self)( 1824 new_dims, new_data, self._attrs, self._encoding, fastpath=True 1825 ) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, shape) 2217 if len(shape) == 1 and not isinstance(shape[0], Number): 2218 shape = shape[0] -> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit) 283 else: 284 chunk_plan.append("auto") --> 285 outchunks = normalize_chunks( 286 chunk_plan, 287 shape=shape, 288 limit=limit, 289 dtype=x.dtype, 290 previous_chunks=inchunks, 291 ) 293 x2 = x.rechunk(inchunks) 295 # Construct graph File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks) 3092 chunks = tuple("auto" if isinstance(c, str) and c != "auto" else c for c in chunks) 3094 if any(c == "auto" for c in chunks): -> 3095 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3097 if shape is not None: 3098 chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in <dictcomp>(.0) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): IndexError: tuple index out of range ``` Anything else we need to know? The most recent traceback entry point to an issue in dask code. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 \| packaged by conda-forge \| (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.9.0 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.9.3 distributed: 2023.9.3 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: None sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8345/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1923431725	I_kwDOAMm_X85ypT0t	8264	Improve error messages	max-sixty 5635139	open	0	4	2023-10-03T06:42:57Z	2023-10-24T18:40:04Z		MEMBER	Is your feature request related to a problem? Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline. Some of the error messages could be much more helpful. Take one example: `xarray.core.merge.MergeError: conflicting values for variable 'date' on objects to be combined. You can skip this check by specifying compat='override'.` The second sentence is nice. But the first could be give us much more information: - Which variables conflict? I'm merging four objects, so would be so helpful to know which are causing the issue. - What is the conflict? Is one a superset and I can `join=...`? Are they off by 1 or are they completely different types? - Our `testing.assert_equal` produces pretty nice errors, as a comparison Having these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library. Describe the solution you'd like I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date. One thing we do in PRQL is have a file that snapshots error messages `test_bad_error_messages.rs`, which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter, `pytest-regtest` is the best I've found; I wrote `pytest-accept` but requires doctests). Any other ideas? Describe alternatives you've considered No response Additional context A couple of specific error-message issues: - https://github.com/pydata/xarray/issues/2078 - https://github.com/pydata/xarray/issues/5290	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8264/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
529644880	MDU6SXNzdWU1Mjk2NDQ4ODA=	3580	xr.DataArray.values fails with latest versions of netcdf4	kpegion 16332933	closed	0	4	2019-11-28T01:26:07Z	2023-10-18T17:01:17Z	2023-10-18T17:01:17Z	NONE	MCVE Code Sample ```python import xarray as xr xr.show_versions() url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/NCEP-CFSv2/.HINDCAST/.MONTHLY/.sst/dods' fullda = xr.open_dataset(url, decode_times=False,chunks={'S': 'auto', 'L': 'auto', 'M':'auto','X':'auto','Y':'auto'}) print(fullda) print(fullda['sst'][:10,0,0,0,0].values) ``` Expected Output python <xarray.Dataset> Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181) Coordinates: * X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 * S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0 * M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0 * Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0 Data variables: sst (S, L, M, Y, X) float32 dask.array<chunksize=(29, 10, 24, 51, 45), meta=np.ndarray> Attributes: Conventions: IRIDL [-25.652588 -35.577393 -48.702896 -51.3853 -50.687195 -50.341995 -50.407593 -54.955994 -52.052994 -47.31279 ] Problem Description This should return the array’s data as a numpy.ndarray according to the documentation and as shown above. I tested this with various versions of netcdf4 and I get the error below for netcdf4 versions 1.5.1, 1.5.1.2, 1.5.3 (latest version). If I use netcdf4 version 1.5.1, I get the expected output as above. ``` python <xarray.Dataset> Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181) Coordinates: * X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 * S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0 * M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0 * Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0 Data variables: sst (S, L, M, Y, X) float32 dask.array<chunksize=(29, 10, 24, 51, 45), meta=np.ndarray> Attributes: Conventions: IRIDL Traceback (most recent call last): File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 84, in _getitem array = getitem(original_array, key) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/common.py", line 54, in robust_getitem return array[key] File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.getitem File "netCDF4/_netCDF4.pyx", line 5350, in netCDF4._netCDF4.Variable._get IndexError: index exceeds dimension bounds During handling of the above exception, another exception occurred: Traceback (most recent call last): File "testpython.py", line 7, in <module> print(fullda['sst'][:10,0,0,0,0].values) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/dataarray.py", line 567, in values return self.variable.values File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py", line 448, in values return as_array_or_item(self._data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py", line 254, in _as_array_or_item data = np.asarray(data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py", line 1314, in __array__ x = self.compute() File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py", line 165, in compute (result,) = compute(self, traverse=False, kwargs) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/threaded.py", line 81, in get kwargs File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 486, in get_async raise_exception(exc, tb) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 316, in reraise raise exc File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 222, in execute_task result = _execute_task(task, data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/core.py", line 119, in _execute_task return func(args2) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py", line 106, in getter c = np.asarray(c) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 481, in array* return np.asarray(self.array, dtype=dtype) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 94, in _getitem raise IndexError(msg) IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load(). ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \| packaged by conda-forge \| (default, Nov 6 2019, 16:19:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.4.3.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.8.1 distributed: 2.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 42.0.1.post20191125 pip: 19.3.1 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3580/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1924497392	I_kwDOAMm_X85ytX_w	8269	open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0'	mps01060 6819509	closed	0	4	2023-10-03T16:19:54Z	2023-10-18T16:50:20Z	2023-10-18T16:50:20Z	NONE	What is your issue? When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units "days accumulated", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?). Open the zarr: `python import xarray as xr ds = xr.open_dataset('debug.zarr', engine='zarr', chunks={})` Print as a pandas-like table for each version of xarray for readability: `python ds.to_dataframe()` Version '2023.8.0': \|time\|dapr (dtype=float32)\|mdpr (dtype=float32)\| \|---\|---\|---\| \|2000-01-01\|NaN\|NaN\| \|2000-01-02\|NaN\|NaN\| \|2000-01-03\|2.0\|1.5\| Version '2023.9.0': \|time\|dapr (dtype=float64)\|mdpr (dtype=float32)\| \|---\|---\|---\| \|2000-01-01\|-9.223372e+18\|NaN\| \|2000-01-02\|-9.223372e+18\|NaN\| \|2000-01-03\|2.000000e+00\|1.5\| I can manually disable this by using the "use_cf=False", "mask_and_scale=False", and then manually scale this variable, though that is not ideal. The "decode_timedelta" doesn't seem to have an effect on this data, either. I understand the "days" keyword is in my units, however the full unit is "days accumulated". Has the behavior of xarray changed to find keywords such as "days" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help. Code to create the debug.zarr for the tables above: ```python import numpy as np import pandas as pd import xarray as xr import zarr Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily) mdpr is the amount of a multiday total (inches) dapr is the number of days each multiday total occurred over (days accumulated). In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03 I use float32 to represent these, but pack these as int16 values in the zarr. mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32) dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32) time = pd.date_range('2000-01-01', periods=3) Create a dataset from these values ds = xr.Dataset( data_vars=dict( mdpr=(['time'], mdpr), dapr=(['time'], dapr), ), coords=dict( time=time, ), attrs=dict(description='multiday precipitation data'), ) Specify encoding to pack these float32 values as int16 encoding = { 'mdpr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 0.01, 'add_offset': 0.0, 'dtype': np.int16, }, 'dapr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 1.0, 'add_offset': 0.0, 'dtype': np.int16, }, } Create attributes. The "units" for the dapr variable seems to be the issue "days" in the "days accumulated" ds.mdpr.attrs['units'] = 'inches' ds.mdpr.attrs['description'] = 'multiday precip amount' ds.dapr.attrs['units'] = 'days accumulated' ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation' Save to zarr ds.to_zarr('debug.zarr', mode='w', encoding=encoding) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8269/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1384226112	I_kwDOAMm_X85SgZ1A	7075	Convert xarray dataset to pandas dataframe is much slower in newest xarray version	rilllydi 20794996	closed	0	4	2022-09-23T19:36:28Z	2023-10-14T20:37:40Z	2023-10-14T20:37:40Z	NONE	What is your issue? Converting an xarray dataset to pandas dataframe has become much slower in the newest xarray version. I want to read in very large netcdf files, extract a slice, and convert the slice to a pandas dataframe. For an input size of 2GB, the xarray version 0.21.0 takes 3 seconds versus the xarray version 2022.6.0 takes 44 seconds. See table below for more tests with increasing size of xarray dataset. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> Number of NetCDF Input Files in Xarray Dataset (~1GB per file): \| 2 \| 5 \| 10 \| 15 \| 20 \| 30 \| 40 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Older Xarray Version 0.21.0 \| 0:03 \| 0:02 \| 0:04 \| 0:06 \| 0:09 \| 0:13 \| 0:17 Newer Xarray Version 2022.6.0 \| 0:44 \| 1:30 \| 2:46 \| 4:01 \| 5:23 \| 7:56 \| 10:29 </body> </html> Here is my code: ``` Read in a list of netcdf files and combine into a single dataset. with xr.open_mfdataset(infile_list, combine='by_coords') as ds: `# Extract the data for a single location (the nearest grid point) using the provided coordinates (lat/lon). ds_slice = ds.sel(lon=-84.725, lat=42.3583, method='nearest') # Convert xarray dataset to a pandas dataframe. # This is now the slow part since the xarray library was updated. df = ds_slice.to_dataframe()` ``` The netcdf files I am reading in are about 1 GB each, containing daily weather data for the entire CONUS. There is 1 file per year, so if I read in 2 files, the dimensions are (lon: 1386, lat: 585, day: 731, crs: 1) with coordinates of lon, lat, day, and crs. They include 8 float data variables.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7075/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1943355490	I_kwDOAMm_X85z1UBi	8308	Different plotting reaults compared to matplotlib	zxdawn 30388627	closed	0	4	2023-10-14T15:54:32Z	2023-10-14T20:02:16Z	2023-10-14T20:02:16Z	NONE	What happened? I got different results when I tried to plot 2D data test.npy.zip using matplotlib and xarray. matplotlib xarray What did you expect to happen? Same plot. Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr import matplotlib.pyplot as plt test = np.load('test.npy') plt.imshow(test, vmin=0, vmax=200) plt.colorbar() xr.DataArray(test).plot.imshow(vmin=0, vmax=200) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 \| packaged by conda-forge \| (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 22.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.9.0 pandas: 2.1.1 numpy: 1.26.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8308/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1821467933	I_kwDOAMm_X85skWUd	8021	Specify chunks in bytes	mrocklin 306380	open	0	4	2023-07-26T02:29:43Z	2023-10-06T10:09:33Z		MEMBER	Is your feature request related to a problem? I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an `open_zarr` call and then provide the right `chunks=` argument. I'll admit though that I wouldn't mind giving Xarray a value like `"1 GiB"` though and having it use that when determining `"auto"` chunk sizes. Dask array does this in two ways. We can provide a value in chunks as like the following: `python x = da.random.random(..., chunks="1 GiB")` We also refer to a value in Dask config ```python In [1]: import dask In [2]: dask.config.get("array.chunk-size") Out[2]: '128MiB' ``` This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂 Describe the solution you'd like No response Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8021/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1169750048	I_kwDOAMm_X85FuPgg	6360	Multidimensional `interpolate_na()`	iuryt 5797727	open	0	4	2022-03-15T14:27:46Z	2023-09-28T11:51:20Z		NONE	Is your feature request related to a problem? I think that having a way to run a multidimensional interpolation for filling missing values would be awesome. The code snippet below create a data and show the problem I am having now. If the data has some orientation, we couldn't simply interpolate dimensions separately. ```python import xarray as xr import numpy as np n = 30 x = xr.DataArray(np.linspace(0,2np.pi,n),dims=['x']) y = xr.DataArray(np.linspace(0,2np.pi,n),dims=['y']) z = (np.sin(x)xr.ones_like(y)) mask = xr.DataArray(np.random.randint(0,1+1,(n,n)).astype('bool'),dims=['x','y']) kw = dict(add_colorbar=False) fig,ax = plt.subplots(1,3,figsize=(11,3)) z.plot(ax=ax[0],kw) z.where(mask).plot(ax=ax[1],kw) z.where(mask).interpolate_na('x').plot(ax=ax[2],kw) ``` I tried to use advanced interpolation for that, but it doesn't look like the best solution. ```python zs = z.where(mask).stack(k=['x','y']) zs = zs.where(np.isnan(zs),drop=True) xi,yi = zs.k.x.drop('k'),zs.k.y.drop('k') zi = z.interp(x=xi,y=yi) fig,ax = plt.subplots() z.where(mask).plot(ax=ax,kw) ax.scatter(xi,yi,c=zi,kw,linewidth=1,edgecolor='k') ``` returns Describe the solution you'd like Simply `z.interpolate_na(['x','y'])` Describe alternatives you've considered I could extract the data to `numpy` and interpolate using `scipy.interpolate.griddata`, but this is not the way `xarray` should work. Additional context No response*	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6360/reactions", "total_count": 11, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 2 }		xarray 13221727	issue
1905824568	I_kwDOAMm_X85xmJM4	8221	Frequent doc build timeout / OOM	max-sixty 5635139	open	0	4	2023-09-20T23:02:37Z	2023-09-21T03:50:07Z		MEMBER	What is your issue? I'm frequently seeing `Command killed due to timeout or excessive memory consumption` in the doc build. It's after 1552 seconds, so it not being a round number means it might be the memory? It follows `writing output... [ 90%] generated/xarray.core.rolling.DatasetRolling.max`, which I wouldn't have thought as a particularly memory-intensive part of the build? Here's an example: https://readthedocs.org/projects/xray/builds/21983708/ Any thoughts for what might be going on?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8221/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1326238990	I_kwDOAMm_X85PDM0O	6870	`rolling_exp` loses coords	max-sixty 5635139	closed	0	4	2022-08-02T18:27:44Z	2023-09-19T01:13:23Z	2023-09-19T01:13:23Z	MEMBER	What happened? We lose the time coord here — `Dimensions without coordinates: time`: ```python ds = xr.tutorial.load_dataset("air_temperature") ds.rolling_exp(time=5).mean() <xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 Dimensions without coordinates: time Data variables: air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.4 296.1 295.7 ``` (I realize I wrote this, I didn't think this used to happen, but either it always did or I didn't write good enough tests... mea culpa) What did you expect to happen? We keep the time coords, like we do for normal `rolling`: `python In [2]: ds.rolling(time=5).mean() Out[2]: <xarray.Dataset> Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00` Minimal Complete Verifiable Example `Python (as above)` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (main, May 24 2022, 21:13:51) [Clang 13.1.6 (clang-1316.0.21.2)] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.21.6 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.12.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.12.0 distributed: 2021.12.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: 0.2.1 fsspec: 2021.11.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 62.3.2 pip: 22.1.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: 4.3.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6870/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
598991028	MDU6SXNzdWU1OTg5OTEwMjg=	3967	Support static type analysis	eric-czech 6130352	closed	0	4	2020-04-13T16:34:43Z	2023-09-17T19:43:32Z	2023-09-17T19:43:31Z	NONE	As a related discussion to https://github.com/pydata/xarray/issues/3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis. In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able enforce that names and dtypes associated with both data variables and coordinates meet certain constraints. @keewis mentioned an example of this in https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 where it might be possible to use something like a `TypedDict` to constrain variable/coord names and array dtypes, but this won't work with TypedDict as it's currently implemented. Another possibility could be generics, and I took a stab at that in https://github.com/pydata/xarray/issues/3959#issuecomment-612513722 (though this would certainly be more intrusive). An example of where this would be useful is in adding extensions through accessors: ```python @xr.register_dataset_accessor('ext') def ExtAccessor: def init(self, ds) self.data = ds `def is_zero(self): return self.ds['data'] == 0` ds = xr.Dataset(dict(DATA=xr.DataArray([0.0]))) I'd like to catch that "data" was misspelled as "DATA" and that this particular method shouldn't be run against floats prior to runtime ds.ext.is_zero() ``` I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too. There is a related conversation on doing something like this for Pandas DataFrames at https://github.com/python/typing/issues/28#issuecomment-351284520, so that might be helpful context for possibilities with `TypeDict`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3967/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
561921094	MDU6SXNzdWU1NjE5MjEwOTQ=	3762	xarray groupby/map fails to parallelize	bjcosta 6491058	closed	1	4	2020-02-07T23:20:59Z	2023-09-15T15:52:42Z	2023-09-15T15:52:41Z	NONE	MCVE Code Sample ```python import sys import math import logging import dask import xarray import numpy logger = logging.getLogger('main') if name == 'main': logging.basicConfig( stream=sys.stdout, format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S') logger.info('Starting dask client') client = dask.distributed.Client() SIZE = 100000 SONAR_BINS = 2000 time = range(0, SIZE) upper_limit = numpy.random.randint(0, 10, (SIZE)) lower_limit = numpy.random.randint(20, 30, (SIZE)) sonar_data = numpy.random.randint(0, 255, (SIZE, SONAR_BINS)) channel = xarray.Dataset({ 'upper_limit': (['time'], upper_limit, {'units': 'depth meters'}), 'lower_limit': (['time'], lower_limit, {'units': 'depth meters'}), 'data': (['time', 'depth_bin'], sonar_data, {'units': 'amplitude'}), }, coords={ 'depth_bin': (['depth_bin'], range(0,SONAR_BINS)), 'time': (['time'], time) }) logger.info('get overall min/max radar range we want to normalize to called the adjusted range') adjusted_min, adjusted_max = channel.upper_limit.min().values.item(), channel.lower_limit.max().values.item() adjusted_min = math.floor(adjusted_min) adjusted_max = math.ceil(adjusted_max) logger.info('adjusted_min: %s, adjusted_max: %s', adjusted_min, adjusted_max) bin_count = len(channel.depth_bin) logger.info('bin_count: %s', bin_count) adjusted_depth_per_bin = (adjusted_max - adjusted_min) / bin_count logger.info('adjusted_depth_per_bin: %s', adjusted_depth_per_bin) adjusted_bin_depths = [adjusted_min + (j * adjusted_depth_per_bin) for j in range(0, bin_count)] logger.info('adjusted_bin_depths[0]: %s ... [-1]: %s', adjusted_bin_depths[0], adjusted_bin_depths[-1]) def Interp(ds): # Ideally instead of using interp we will use some kind of downsampling and shift # this doesnt exist in xarray though and interp is good enough for the moment # I just added this to debug t = ds.time.values.item() if (t % 100) == 0: total = len(channel.time) perc = 100.0 * t / total logger.info('%s : %s of %s', perc, t, total) unadjusted_depth_amplitudes = ds.data unadjusted_min = ds.upper_limit.values.item() unadjusted_max = ds.lower_limit.values.item() unadjusted_depth_per_bin = (unadjusted_max - unadjusted_min) / bin_count index_mapping = [((adjusted_min + (bin * adjusted_depth_per_bin)) - unadjusted_min) / unadjusted_depth_per_bin for bin in range(0, bin_count)] adjusted_depth_amplitudes = unadjusted_depth_amplitudes.interp(coords={'depth_bin':index_mapping}, method='linear', assume_sorted=True) adjusted_depth_amplitudes = adjusted_depth_amplitudes.rename({'depth_bin':'depth'}).assign_coords({'depth':adjusted_bin_depths}) #logger.info('%s, \n\tunadjusted_depth_amplitudes.values:%s\n\tunadjusted_min:%s\n\tunadjusted_max:%s\n\tunadjusted_depth_per_bin:%s\n\tindex_mapping:%s\n\tadjusted_depth_amplitudes:%s\n\tadjusted_depth_amplitudes.values:%s\n\n', ds, unadjusted_depth_amplitudes.values, unadjusted_min, unadjusted_max, unadjusted_depth_per_bin, index_mapping, adjusted_depth_amplitudes, adjusted_depth_amplitudes.values) return adjusted_depth_amplitudes # Lets split into chunks so could be performed in parallel # This doesnt work to parallelize and only slows it down a lot #logger.info('chunk') #channel = channel.chunk({'time':100}) logger.info('groupby') g = channel.groupby('time') logger.info('do interp') normalized_depth_data = g.map(Interp) logger.info('done') ``` Expected Output I am fairly new to xarray but feel this example could have been executed a bit better than xarray currenty does. Each map call of the above custom function should be possible to be parallelized from what I can tell. I imagined that in the backend, xarray would have chunked it and run in parallel on dask. However I find it is VERY slow even for single threaded case but also that it doesn't seem to parallelize. It takes roughly 5msec per map call in my hardware when I don't include the chunk and 70msec with the chunk call you can find in the code. Problem Description The single threaded performance is super slow, but also it fails to parallelize the computations across the cores on my machine. If you are after more background to what I am trying to do, I also asked a SO question about how to re-organize the code to improve performance. I felt the current behavior though is a performance bug (assuming I didn't do something completely wrong in the code). https://stackoverflow.com/questions/60103317/can-the-performance-of-using-xarray-groupby-map-be-improved Output of `xr.show_versions()` # Paste the output here xr.show_versions() here xarray.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 44.0.0.post20200102 pip: 19.3.1 conda: None pytest: None IPython: 7.11.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3762/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1473152374	I_kwDOAMm_X85XzoV2	7348	Using entry_points to register dataset and dataarray accessors?	nbren12 1386642	open	0	4	2022-12-02T16:48:42Z	2023-09-14T19:53:46Z		CONTRIBUTOR	Is your feature request related to a problem? External libraries often use the dataset/dataarray accessor pattern (e.g. metpy). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g. ``` import metpy # linter complains here some data ds: xr.Dataset = ... ds.metpy.... ``` Describe the solution you'd like Use importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. metpy's setup.cfg). Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7348/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
1098241812	I_kwDOAMm_X85BddcU	6149	[Bug]: `numpy` `DeprecationWarning` with `DType` and `xr.testing.assert_all_close()` + Dask	tomvothecoder 25624127	closed	0	4	2022-01-10T18:34:27Z	2023-09-13T20:06:59Z	2023-09-13T20:06:58Z	CONTRIBUTOR	What happened? A `numpy` `DeprecationWarning` regarding `DType` is being outputted when using `xr.testing.assert_all_close()` to compare two chunked Datasets. This does warning does not appear with two non-chunked datasets. What did you expect to happen? The warning should not appear. Minimal Complete Verifiable Example ```python class TestTemporalAvg: class TestTimeseries: @pytest.fixture(autouse=True) def setup(self): self.ds: xr.Dataset = generate_dataset(cf_compliant=True, has_bounds=True) # No warning with this test def test_weighted_annual_avg(self): ds = self.ds.copy() result = ds.temporal.temporal_avg("timeseries", "year", data_var="ts") expected = ds.copy() expected["ts"] = xr.DataArray( name="ts", data=np.ones((2, 4, 4)), coords={ "lat": self.ds.lat, "lon": self.ds.lon, "year": pd.MultiIndex.from_tuples( [(2000,), (2001,)], ), }, dims=["year", "lat", "lon"], attrs={ "operation": "temporal_avg", "mode": "timeseries", "freq": "year", "groupby": "year", "weighted": "True", "centered_time": "True", }, ) # For some reason, there is a floating point difference between both # for ts so we have to use floating point comparison xr.testing.assert_allclose(result, expected) assert result.ts.attrs == expected.ts.attrs # Warning with this test @requires_dask def test_weighted_annual_avg_with_chunking(self): ds = self.ds.copy().chunk({"time": 2}) result = ds.temporal.temporal_avg("timeseries", "year", data_var="ts") expected = ds.copy() expected["ts"] = xr.DataArray( name="ts", data=np.ones((2, 4, 4)), coords={ "lat": ds.lat, "lon": ds.lon, "year": pd.MultiIndex.from_tuples( [(2000,), (2001,)], ), }, dims=["year", "lat", "lon"], attrs={ "operation": "temporal_avg", "mode": "timeseries", "freq": "year", "groupby": "year", "weighted": "True", "centered_time": "True", }, ) # For some reason, there is a floating point difference between both # for ts so we have to use floating point comparison xr.testing.assert_allclose(result, expected) assert result.ts.attrs == expected.ts.attrs ``` Relevant log output python DeprecationWarning: The `dtype` and `signature` arguments to ufuncs only select the general DType and not details such as the byte order or time unit (with rare exceptions see release notes). To avoid this warning please use the scalar types `np.float64`, or string notation. In rare cases where the time unit was preserved, either cast the inputs or provide an output array. In the future NumPy may transition to allow providing `dtype=` to denote the outputs `dtype` as well. (Deprecated NumPy 1.21) return ufunc.reduce(obj, axis, dtype, out, *passkwargs) Anything else we need to know? No response* Environment INSTALLED VERSIONS commit: None python: 3.9.7 \| packaged by conda-forge \| (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.45.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.11.2 distributed: 2021.11.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: 4.3.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6149/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
1075765204	I_kwDOAMm_X85AHt_U	6055	Unexpected type conversion in variables with _FillValue	jp-dark 24235303	closed	0	4	2021-12-09T16:26:54Z	2023-09-13T12:40:14Z	2023-09-13T12:40:13Z	CONTRIBUTOR	What happened: When opening a dataset with an int16 variable with the `_FillValue` attribute, the variable is converted from type int16 to float32. This was originally reported to the TileDB-CF-Py Git repo that contains a TileDB backend for xarray. See TileDB-CF-Py issue #117. What you expected to happen: I would expect the type to remain the same when applying the _FillValue. Minimal Complete Verifiable Example: Original example from TileDB-CF-Py issue #117 using the TileDB backend. ```python import tiledb import xarray as xr import numpy as np index = tiledb.Dim(name='index', domain=(0, 3)) domain = tiledb.Domain(index) var = tiledb.Attr(name='var', dtype=np.int16) schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False) tiledb.Array.create('dense_array0', schema) with tiledb.open('dense_array0', 'w') as A: A[:] = np.array([5, 6, 7, 8], dtype=np.int16) ds = xr.open_dataset('dense_array0', engine='tiledb') ds['var'].dtype ``` NetCDF example with the same behavior: ```python import netCDF4 import xarray as xr import numpy as np filename = 'temp_file.nc' with netCDF4.Dataset(filename, mode="w") as group: group.createDimension("index", 4) var = group.createVariable("var", np.int16, ("index",), fill_value=-1) var[:] = np.array([5, 6, 7, 8], dtype=np.int16) dataset = xr.open_dataset(filename) dataset["var"].dtype ``` Anything else we need to know?: * I was able to verify the type conversion from int16 to float32 occurs in the `conventions.decode_cf_variables` call in the `open_dataset` method of `StoreBackendEntrypoint`. * I was able to verify the conversion does not happen if `mask_and_scale=False`. * Note that TileDB is automatically setting a fill value for all dense numerical arrays, and so we are always setting the `_FillValue` attribute for variables from the TileDB backend. Environment: I was able to reproduce this with both xarray 0.19.0 and 0.20.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6055/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }	completed	xarray 13221727	issue
514672231	MDU6SXNzdWU1MTQ2NzIyMzE=	3466	RuntimeError: NetCDF: DAP failure	b-kode 47066389	closed	1	4	2019-10-30T13:32:34Z	2023-09-12T16:00:57Z	2023-09-12T16:00:57Z	NONE	Hi all, I am interested in extracting specific point and variable information from the GEOS-FC product, accessible via OpenDap. Loading the data seems to work fine, and I can do some processing to my specific needs. Ideally I would like to convert this selection to a dataframe, or if needed store as an intermediate file from which I can read again. Yet when doing so, I get the following error: RuntimeError: NetCDF: DAP failure I am not sure what is causing this? Perhaps I chunck the data in the wrong (inefficient) way? Or there is an error with the GEOS netcdf files? Or ... Below a working code snippet. ``` python import xarray as xr idir_geos = 'https://opendap.nccs.nasa.gov/dods/gmao/geos-cf/assim/chm_tavg_1hr_g1440x721_v1' def preprocess(ds): ''' Rename variables and select the relevant ones. Remove lev''' ds = ds.rename({'pm25_rh35_gcc': 'PM2.5','no': 'NO','no2': 'NO2','o3': 'O3','so2': 'SO2','co': 'CO'}) ds = ds[['PM2.5','NO','NO2','O3','SO2','CO']] ds = ds.squeeze('lev') return ds ds = xr.open_mfdataset([idir_geos],preprocess=preprocess,combine='by_coords') lat = 51.25 lon = 4.25 pol = 'O3' ds_sel = ds.sel(lat=lat,lon=lon,method='nearest')[pol] df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1) ds_sel.to_netcdf('test.nc') # Runtime error ``` Traceback error: Traceback (most recent call last): File "/home/demuzmp4/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-2-fccd11da2246>", line 57, in <module> df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py", line 4285, in to_dataframe return self.to_dataframe(self.dims) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py", line 4273, in _to_dataframe for k in columns File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py", line 4273, in <listcomp> for k in columns File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py", line 437, in values return _as_array_or_item(self._data) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py", line 250, in _as_array_or_item data = np.asarray(data) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/usr/lib/python3/dist-packages/dask/array/core.py", line 1138, in __array__ x = self.compute() File "/usr/lib/python3/dist-packages/dask/base.py", line 135, in compute (result,) = compute(self, traverse=False, kwargs) File "/usr/lib/python3/dist-packages/dask/base.py", line 333, in compute results = get(dsk, keys, kwargs) File "/usr/lib/python3/dist-packages/dask/threaded.py", line 75, in get pack_exception=pack_exception, kwargs) File "/usr/lib/python3/dist-packages/dask/local.py", line 521, in get_async raise_exception(exc, tb) File "/usr/lib/python3/dist-packages/dask/compatibility.py", line 60, in reraise raise exc File "/usr/lib/python3/dist-packages/dask/local.py", line 290, in execute_task result = _execute_task(task, data) File "/usr/lib/python3/dist-packages/dask/local.py", line 271, in _execute_task return func(args2) File "/usr/lib/python3/dist-packages/dask/array/core.py", line 72, in getter c = np.asarray(c) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py", line 490, in array* return np.asarray(self.array, dtype=dtype) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py", line 652, in array return np.asarray(self.array, dtype=dtype) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py", line 556, in array return np.asarray(array[self.key], dtype=None) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py", line 73, in array return self.func(self.array) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py", line 142, in _apply_mask data = np.asarray(data, dtype=dtype) File "/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py", line 556, in array return np.asarray(array[self.key], dtype=None) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py", line 836, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 84, in _getitem array = getitem(original_array, key) File "/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/common.py", line 54, in robust_getitem return array[key] File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.getitem File "netCDF4/_netCDF4.pyx", line 5352, in netCDF4._netCDF4.Variable._get File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: DAP failure More info on my xarray installation: commit: None python: 3.6.9 (default, Jul 3 2019, 07:38:46) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_GB.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: 1.2.1 dask: 0.16.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 9.0.1 conda: None pytest: 5.2.1 IPython: 7.3.0 sphinx: 1.8.4	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3466/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1339921253	I_kwDOAMm_X85P3ZNl	6919	Parallel read with MPI	mengaldo 8100801	closed	0	4	2022-08-16T07:19:14Z	2023-09-12T15:16:32Z	2023-09-12T15:16:31Z	NONE	Is your feature request related to a problem? Is it possible to somehow extend xarray to use MPI I/O? Describe the solution you'd like We would need to know the offset from where the actual data starts within the file. Is there a way of retrieving that? Disclaimer: I am not an expert of NetCDF format - so, apologies if the question is trivial! Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6919/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1861335844	I_kwDOAMm_X85u8bsk	8096	Errors when saving PyObject coordinates	krokosik 38408316	closed	0	4	2023-08-22T12:14:53Z	2023-09-06T11:44:41Z	2023-09-06T11:44:41Z	CONTRIBUTOR	What happened? Hi, I'm trying to create a `DataArray` with coordinates that are tuples and potentionally even more dimensional objects. The way I did it is to create an empty `numpy` array with `dtype=object` and then insert my tuples inside. This doesn't throw an error when creating a `DataArray` (as opposed to using a 2D ndarray or a list of lists). However, when trying to save it to `zarr` or `netcdf`. I get an error saying `ValueError: setting an array element with a sequence` What did you expect to happen? I want to be able to save and load such coordinates without errors. Maybe there is a cleaner way to do it than the object dtype ndarray? Minimal Complete Verifiable Example `Python n = 5 x = np.empty(n, dtype=object) for i in range(n): x[i] = (i, i) xr.DataArray(np.arange(n), dims=("x"), coords={"x": x}).to_zarr("test")` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python File c:\Users\Wiktor\AppData\Local\pypoetry\Cache\virtualenvs\spin1-JGuolXDk-py3.11\Lib\site-packages\xarray\core\dataarray.py:4014, in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 4010 else: 4011 # No problems with the name - so we're fine! 4012 dataset = self.to_dataset() -> 4014 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 4015 dataset, 4016 path, 4017 mode=mode, 4018 format=format, 4019 group=group, 4020 engine=engine, 4021 encoding=encoding, 4022 unlimited_dims=unlimited_dims, ... 101 result = np.empty(data.shape, dtype) --> 102 result[...] = data 103 return result ValueError: setting an array element with a sequence. ``` Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('Polish_Poland', '1250') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.0.3 numpy: 1.25.2 scipy: 1.11.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.0 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.7.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: 7.1.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8096/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1870484988	I_kwDOAMm_X85vfVX8	8120	`open_mfdataset` exits while sending a "Segmentation fault" error	kasra-keshavarz 50383939	closed	0	4	2023-08-28T20:51:23Z	2023-09-01T15:43:08Z	2023-09-01T15:43:08Z	NONE	What is your issue? I try to open about ~10 files, each 5MB as a test case, using `xarray`'s `open_mfdataset` method with the `parallel=True` option, however, it throws a "Segmentation fault" error as the following: ```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import xarray as xr In [2]: ds = xr.open_mfdataset('./ab_models_198001.nc', chunks={'time':10}) In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention <Product>_<Type... License: These data are provided by the Canadian Surface Prediction ... history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1... NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne... CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de... In [4]: type(ds) Out[4]: xarray.core.dataset.Dataset In [5]: ds = xr.open_mfdataset('./ab_models_198001.nc', chunks={'time':10}, parallel=True) [gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) [gra-login3:25527] Process received signal * [gra-login3:25527] Signal: Segmentation fault (11) [gra-login3:25527] Signal code: (128) [gra-login3:25527] Failing at address: (nil) Segmentation fault ``` Here is the version of `xarray`: ```python In [5]: xr.show_versions() /home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/init.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS commit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0 xarray: 2023.7.0 pandas: 1.4.0 numpy: 1.21.2 scipy: 1.8.0 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.0 distributed: 2023.8.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 60.2.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.10.0 sphinx: None ``` I'm working on an HPC, so if a list "modules" I have loaded helps, here it is: ```console $ module list Currently Loaded Modules: 1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo) 2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t) 3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io) 4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys) Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques io: Input/output software / Logiciel d'écriture/lecture t: Tools for development / Outils de développement vis: Visualisation software / Logiciels de visualisation geo: Geography libraries/apps / Logiciels de géographie phys: Physics libraries/apps / Logiciels de physique H: Hidden Module ``` Thanks.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8120/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1611701140	I_kwDOAMm_X85gEJuU	7588	xr.merge with compat="minimal" returns corrupted Dataset and causes __len__ to return wrong and possibly negative values.	Metamess 2466330	closed	0	4	2023-03-06T15:47:40Z	2023-08-30T09:14:19Z	2023-08-30T07:57:37Z	CONTRIBUTOR	What happened? When merging multiple datasets with the compat="minimal" option, coordinates whose variables are dropped due to incompatibility are still saved in the dataset's `_coord_names`. I believe the cause for this to originate in line 752 of `merge_core`, where the coordinate names are based on the datasets in `coerced`, which is not impacted by the dropping of (coordinate) variables/indexes in the `merge_collected` function. This is directly related to the bug described in issue 7405. As seen there, one result is that dropped coordinate still evaluates as being contained in the resulting dataset's `coords`. The effects of this bug are more widespread, which this issue attempts to dive into. At least one other (perhaps more severe) result of this bug is connected to the fact that the `__len__` function of a DataVariable is implemented as follows: `return len(self._dataset._variables) - len(self._dataset._coord_names)` If a coordinate was dropped as a result of the merge, it is no longer part of the `_variables`, but still listed in the `_coord_names`, and as such the result of `len()` will be off by 1 for each such coordinate. This also means that the result of `len()` can become negative, which causes python to raise `ValueError: __len__() should return >= 0`. One instance where this causes immediate errors is when trying to print the resulting dataset. As part of the `__repr__` of a Dataset, a boolean evaluation of the DataVariable is performed (`if mapping:` in `xarray/core/formatting.py` in `_mapping_repr`), calling `__len__` to check the truth value and triggering the ValueError. While this is undoubtedly only one of many places where the incorrect `__len__` causes issues, it is a rather pressing one as it even stops one from inspecting the Dataset in the most common way (printing it). The ValueError it produces is also very hard to trace back to the actual cause, likely completely throwing users off from fixing their code. What did you expect to happen? To get a Dataset with the correct `_coord_names` property, and in no circumstance whatsoever to get a Dataset which reports a negative length Minimal Complete Verifiable Example ```Python import xarray as xr ds1 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 4}) ds2 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 5}) res = xr.merge([ds1, ds2], compat="minimal") # If the result is not captured in res, this will cause a ValueError as the interpreter attempts to print the result res.coords Coordinates: * foo (foo) int64 1 2 3 res._coord_names {'foo', 'bar'} "bar" in res.coords # As shown in issue #7405. Note "bar" is not printed in res.coords, revealing an interesting disconnect in behaviors of different functions targeting a dataset's coordinates True res ValueError: len() should return >= 0 ``` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python import xarray as xr ds1 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 4}) ds2 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 5}) res = xr.merge([ds1, ds2], compat="minimal") res.coords Coordinates: * foo (foo) int64 1 2 3 res._coord_names {'bar', 'foo'} "bar" in res.coords True res Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/redacted/.venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 2116, in repr return formatting.dataset_repr(self) File "/usr/lib/python3.10/reprlib.py", line 21, in wrapper result = user_function(self) File "/home/redacted/.venv/lib/python3.10/site-packages/xarray/core/formatting.py", line 673, in dataset_repr summary.append(data_vars_repr(ds.data_vars, col_width=col_width, max_rows=max_rows)) File "/home/redacted/.lvenv/lib/python3.10/site-packages/xarray/core/formatting.py", line 357, in _mapping_repr if mapping: ValueError: len() should return >= 0 ``` Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.16.3-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.2.0 pandas: 1.5.1 numpy: 1.24.2 scipy: 1.10.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.6 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.10.3 iris: None bottleneck: 1.3.6 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.1.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 59.6.0 pip: 23.0.1 conda: None pytest: 7.2.1 mypy: 1.0.1 IPython: 7.34.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7588/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1858062203	I_kwDOAMm_X85uv8d7	8090	DataArrayResampleAggregations break with _flox_reduce where source DataArray has a discontinuous time dimension	ollie-bell 56110893	open	0	4	2023-08-20T09:48:42Z	2023-08-24T04:20:32Z		NONE	What happened? When resampling a DataArray with a discontinuity in the time dimension the resample object contains placeholder groups for the missing times in between the present times. This seems to cause flox reductions to break (`any`, `count` and `all`) as it complains about a `fill_value` of `None`. See example provided below. What did you expect to happen? The result should be computed successfully in the same way that it is without using flox. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np dates = (("1980-12-01", "1990-11-30"), ("2000-12-01", "2010-11-30")) times = [xr.cftime_range(d, freq="D", calendar="360_day") for d in dates] da = xr.concat( [xr.DataArray(np.random.rand(len(t)), coords={"time": t}, dims="time") for t in times], dim="time" ) da = da.chunk(time=360) with xr.set_options(use_flox=True): # FAILS - discontinuous time dimension before resample (da > 0.5).resample(time="AS-DEC").any(dim="time") with xr.set_options(use_flox=True): # SUCCEEDS - continuous time dimension before resample (da.sel(time=slice(dates[0])) > 0.5).resample(time="AS-DEC").any(dim="time") with xr.set_options(use_flox=True): # SUCCEEDS - compute chunks before resample (da > 0.5).compute().resample(time="AS-DEC").any(dim="time") with xr.set_options(use_flox=False): # SUCCEEDS - don't use flox (da > 0.5).resample(time="AS-DEC").any(dim="time") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python ValueError Traceback (most recent call last) Cell In[60], line 1 ----> 1 (da > 0.5).resample(time="AS-DEC").any(dim="time") File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/_aggregations.py:7029, in DataArrayResampleAggregations.any(self, dim, keep_attrs, kwargs) 6960 """ 6961 Reduce this DataArray's data by applying `any` along some dimension(s). 6962 (...) 7022 * time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31 7023 """ 7024 if ( 7025 flox_available 7026 and OPTIONS["use_flox"] 7027 and contains_only_chunked_or_numpy(self._obj) 7028 ): -> 7029 return self._flox_reduce( 7030 func="any", 7031 dim=dim, 7032 # fill_value=fill_value, 7033 keep_attrs=keep_attrs, 7034 kwargs, 7035 ) 7036 else: 7037 return self.reduce( 7038 duck_array_ops.array_any, 7039 dim=dim, 7040 keep_attrs=keep_attrs, 7041 kwargs, 7042 ) File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/resample.py:57, in Resample._flox_reduce(self, dim, keep_attrs, kwargs) 51 def _flox_reduce( 52 self, 53 dim: Dims, 54 keep_attrs: bool \| None = None, 55 kwargs, 56 ) -> T_Xarray: ---> 57 result = super()._flox_reduce(dim=dim, keep_attrs=keep_attrs, kwargs) 58 result = result.rename({RESAMPLE_DIM: self._group_dim}) 59 return result File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/groupby.py:1018, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs) 1015 kwargs.setdefault("min_count", 1) 1017 output_index = grouper.full_index -> 1018 result = xarray_reduce( 1019 obj.drop_vars(non_numeric.keys()), 1020 self._codes, 1021 dim=parsed_dim, 1022 # pass RangeIndex as a hint to flox that `by` is already factorized 1023 expected_groups=(pd.RangeIndex(len(output_index)),), 1024 isbin=False, 1025 keep_attrs=keep_attrs, 1026 kwargs, 1027 ) 1029 # we did end up reducing over dimension(s) that are 1030 # in the grouped variable 1031 group_dims = grouper.group.dims File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:408, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, fill_value, dtype, method, engine, keep_attrs, skipna, min_count, reindex, by, finalize_kwargs) 406 output_core_dims = [d for d in input_core_dims[0] if d not in dim_tuple] 407 output_core_dims.extend(group_names) --> 408 actual = xr.apply_ufunc( 409 wrapper, 410 ds_broad.drop_vars(tuple(missing_dim)).transpose(..., grouper_dims), 411 by_da, 412 input_core_dims=input_core_dims, 413 # for xarray's test_groupby_duplicate_coordinate_labels 414 exclude_dims=set(dim_tuple), 415 output_core_dims=[output_core_dims], 416 dask="allowed", 417 dask_gufunc_kwargs=dict( 418 output_sizes=group_sizes, output_dtypes=[dtype] if dtype is not None else None 419 ), 420 keep_attrs=keep_attrs, 421 kwargs={ 422 "func": func, 423 "axis": axis, 424 "sort": sort, 425 "fill_value": fill_value, 426 "method": method, 427 "min_count": min_count, 428 "skipna": skipna, 429 "engine": engine, 430 "reindex": reindex, 431 "expected_groups": tuple(expected_groups), 432 "isbin": isbins, 433 "finalize_kwargs": finalize_kwargs, 434 "dtype": dtype, 435 "core_dims": input_core_dims, 436 }, 437 ) 439 # restore non-dim coord variables without the core dimension 440 # TODO: shouldn't apply_ufunc handle this? 441 for var in set(ds_broad._coord_names) - set(ds_broad._indexes) - set(ds_broad.dims): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:1185, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, args) 1183 # feed datasets apply_variable_ufunc through apply_dataset_vfunc 1184 elif any(is_dict_like(a) for a in args): -> 1185 return apply_dataset_vfunc( 1186 variables_vfunc, 1187 args, 1188 signature=signature, 1189 join=join, 1190 exclude_dims=exclude_dims, 1191 dataset_join=dataset_join, 1192 fill_value=dataset_fill_value, 1193 keep_attrs=keep_attrs, 1194 ) 1195 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc 1196 elif any(isinstance(a, DataArray) for a in args): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:469, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, args) 464 list_of_coords, list_of_indexes = build_output_coords_and_indexes( 465 args, signature, exclude_dims, combine_attrs=keep_attrs 466 ) 467 args = tuple(getattr(arg, "data_vars", arg) for arg in args) --> 469 result_vars = apply_dict_of_variables_vfunc( 470 func, args, signature=signature, join=dataset_join, fill_value=fill_value 471 ) 473 out: Dataset \| tuple[Dataset, ...] 474 if signature.num_outputs > 1: File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:411, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, args) 409 result_vars = {} 410 for name, variable_args in zip(names, grouped_by_name): --> 411 result_vars[name] = func(variable_args) 413 if signature.num_outputs > 1: 414 return _unpack_dict_tuples(result_vars, signature.num_outputs) File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:761, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, args) 756 if vectorize: 757 func = _vectorize( 758 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims 759 ) --> 761 result_data = func(input_data) 763 if signature.num_outputs == 1: 764 result_data = (result_data,) File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:379, in xarray_reduce.<locals>.wrapper(array, func, skipna, core_dims, by,* kwargs) 376 offset = min(array) 377 array = datetime_to_numeric(array, offset, datetime_unit="us") --> 379 result, groups = groupby_reduce(array, by, func=func, kwargs) 381 # Output of count has an int dtype. 382 if requires_numeric and func != "count": File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:2011, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, dtype, min_count, method, engine, reindex, finalize_kwargs, by) 2005 groups = (groups[0][sorted_idx],) 2007 if factorize_early: 2008 # nan group labels are factorized to -1, and preserved 2009 # now we get rid of them by reindexing 2010 # This also handles bins with no data -> 2011 result = reindex_( 2012 result, from_=groups[0], to=expected_groups, fill_value=fill_value 2013 ).reshape(result.shape[:-1] + grp_shape) 2014 groups = final_groups 2016 if is_bool_array and (_is_minmax_reduction(func) or _is_first_last_reduction(func)): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:428, in reindex_(array, from_, to, fill_value, axis, promote) 426 if any(idx == -1): 427 if fill_value is None: --> 428 raise ValueError("Filling is required. fill_value cannot be None.") 429 indexer[axis] = idx == -1 430 # This allows us to match xarray's type promotion rules ValueError: Filling is required. fill_value cannot be None. ``` Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 \| packaged by conda-forge \| (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 22.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.14.1 libnetcdf: 4.9.2 xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.24.4 scipy: 1.11.1 netCDF4: 1.6.4 pydap: installed h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.16.0 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: 3.6.1 bottleneck: 1.3.7 dask: 2023.8.1 distributed: 2023.8.1 matplotlib: 3.7.2 cartopy: 0.22.0 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: 0.7.2 numpy_groupies: 0.9.22 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8090/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1325665237	I_kwDOAMm_X85PBAvV	6866	Confusing terminologies and some errors in the official documentation	v-liuwei 49091585	closed	0	4	2022-08-02T10:48:07Z	2023-08-23T14:20:23Z	2023-08-23T14:20:23Z	NONE	What happened? To note, I'm using the stable version(2022.6.0). First, I'm confused that both `dimension coordinate`/`non-dimension coordinate` and `index coordinate`/`non-index coordinate` appear in the documentation(search to see), but they seem to be the same thing. Second, I found that there are some errors in the documentation: It says that "The index associated with dimension name x can be retrieved by arr.indexes[x]. By construction, `len(arr.dims) == len(arr.indexes)`", which is inconsistent with actual behavior. See example code below: ```python In [0]: import xarray as xr, numpy as np In [1]: arr = xr.DataArray(np.zeros((2, 3)), dims=['x', 'y'], coords={'x': ['a', 'b']}) In [2]: assert len(arr.dims) == len(arr.indexes), f"{len(arr.dims)=}, {len(arr.indexes)=}" AssertionError Traceback (most recent call last) <ipython-input-202-f217d18e6979> in <module> ----> 1 assert len(arr.dims) == len(arr.indexes), f"{len(arr.dims)=}, {len(arr.indexes)=}" AssertionError: len(arr.dims)=2, len(arr.indexes)=1 In [3]: arr.indexes Out[3]: Indexes: x: Index(['a', 'b'], dtype='object', name='x') It seems that `arr.indexes` only returns indexes of dimensions that have coordinates. However, it's possible to get the index of dimension `y` through `get_index()`:python In [4]: arr.get_index('y') Out[4]: RangeIndex(start=0, stop=3, step=1, name='y') ``` It says that: (see link) For convenience multi-index levels are directly accessible as “virtual” or “derived” coordinates (marked by - when printing a dataset or data array): ```python In [77]: mda["band"] Out[77]: <xarray.DataArray 'band' (spec: 4)> array(['R', 'R', 'V', 'V'], dtype=object) Coordinates: * spec (spec) object MultiIndex * band (spec) object 'R' 'R' 'V' 'V' * wn (spec) float64 0.1 0.2 0.7 0.9 In [78]: mda.wn Out[78]: <xarray.DataArray 'wn' (spec: 4)> array([0.1, 0.2, 0.7, 0.9]) Coordinates: * spec (spec) object MultiIndex * band (spec) object 'R' 'R' 'V' 'V' * wn (spec) float64 0.1 0.2 0.7 0.9 `` As you can see, even in the given example code offered by the offical, all the "virtual" coordinates are marked as`instead of`-`, which is a little bit confusing when handling multi-index coordinates in my experience. May I have missed something? Thanks in advance for the reply. What did you expect to happen? No response* Minimal Complete Verifiable Example No response MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.10.102.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.1 scipy: 1.3.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 45.2.0 pip: 22.2.1 conda: None pytest: None IPython: 7.13.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6866/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
979316661	MDU6SXNzdWU5NzkzMTY2NjE=	5738	Flexible indexes: how to handle possible dimension vs. coordinate name conflicts?	benbovy 4160723	closed	0	4	2021-08-25T15:31:39Z	2023-08-23T13:28:41Z	2023-08-23T13:28:40Z	MEMBER	Another thing that I've noticed while working on #5692. Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with `sel` or `unstack`). See #2299. I'm wondering how we should handle this in the context of flexible / custom indexes: A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in `sel` or `stack`? B. Introduce some tag in `xarray.Index` so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming) C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly? D. Eventually revert #2353 and let users taking care of potential conflicts.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5738/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
448082431	MDU6SXNzdWU0NDgwODI0MzE=	2986	How to add a custom indexer.	fbriol 397386	closed	0	4	2019-05-24T09:56:25Z	2023-08-23T12:24:21Z	2023-08-23T12:24:20Z	CONTRIBUTOR	Hello, I have written a set of indexers for 1D, 2D and 3D geodetic and Cartesian data (up to 5 dimensions for Cartesian data). I used the Boost/C++ library to write the multidimensional data search algorithm. This tree (RTree) is impressive for its performance. It can be built in a few seconds with several million points and made requests for a few seconds with several million points. ```python import numpy as np Install it with conda, if you want, only for python3.7: conda install pyindex -c fbriol import pyindex.core as core lon = np.random.uniform(-180.0, 180.0, 20484096) lat = np.random.uniform(-90.0, 90.0, 20484096) You can not set an altitude if it is not necessary. alt = np.random.uniform(-10000, 100000, 20484096) WGS system used system = core.geodetic.System() RTree tree = core.geodetic.RTree(system) %timeit tree.packing(np.asarray((lon, lat, alt)).T) 3.84 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) coordinates = np.asarray(( np.random.uniform(-180.0, 180.0, 10000), np.random.uniform(-90.0, 90.0, 10000), np.random.uniform(-10000, 100000, 10000))).T %timeit tree.query(coordinates) 18 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` I'm trying to use these indexes with Xarray, but I didn't quite understand how to interface with xarray. Is there anyone who could explain to me how to write my own indexer to test these indexers with xarray? Thank you in advance.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2986/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1603957501	I_kwDOAMm_X85fmnL9	7573	Add optional min versions to conda-forge recipe (`run_constrained`)	dcherian 2448579	closed	0	4	2023-02-28T23:12:15Z	2023-08-21T16:12:34Z	2023-08-21T16:12:21Z	MEMBER	Is your feature request related to a problem? I opened this PR to add minimum versions for our optional dependencies: https://github.com/conda-forge/xarray-feedstock/pull/84/files to prevent issues like #7467 I think we'd need a policy to choose which ones to list. Here's the current list: `run_constrained: - bottleneck >=1.3 - cartopy >=0.20 - cftime >=1.5 - dask-core >=2022.1 - distributed >=2022.1 - flox >=0.5 - h5netcdf >=0.13 - h5py >=3.6 - hdf5 >=1.12 - iris >=3.1 - matplotlib-base >=3.5 - nc-time-axis >=1.4 - netcdf4 >=1.5.7 - numba >=0.55 - pint >=0.18 - scipy >=1.7 - seaborn >=0.11 - sparse >=0.13 - toolz >=0.11 - zarr >=2.10` Some examples to think about: 1. `iris` seems like a bad one to force. It seems like people might use Iris and Xarray independently and Xarray shouldn't force a minimum version. 2. For backends, I arbitrarily kept `netcdf4`, `h5netcdf` and `zarr`. 3. It seems like we should keep array types: so `dask`, `sparse`, `pint`. Describe the solution you'd like No response Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7573/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1845132891	I_kwDOAMm_X85t-n5b	8062	Dataset.chunk() does not overwrite encoding["chunks"]	Metamess 2466330	open	0	4	2023-08-10T12:54:12Z	2023-08-14T18:23:36Z		CONTRIBUTOR	What happened? When using the `chunk` function to change the chunk sizes of a Dataset (or DataArray, which uses the Dataset implementation of `chunk`), the chunk sizes of the Dask arrays are changed, but the "chunks" entry of the `encoding` attributes are not changed accordingly. This causes the raising of a NotImplementedError when attempting to write the Dataset to a zarr (and presumably other formats as well). Looking at the implementation of `chunk`, every variable is rechunked using the `_maybe_chunk` function, which actually has the parameter `overwrite_encoded_chunks` to control just this behavior. However, it is an optional parameter which defaults to False, and the call in `chunk` does not provide a value for this parameter, nor does it offer the caller to influence it (by having an `overwrite_encoded_chunks` parameter itself, for example). I do not know why this default value was chosen as False, or what could break if it was changed to True, but looking at the documentation, it seems the opposite of the intended effect. From the documentation of `to_zarr`: Zarr chunks are determined in the following way: From the chunks attribute in each variable’s encoding (can be set via Dataset.chunk). Which is exactly what it doesn't. What did you expect to happen? I would expect the "chunks" entry of the `encoding` attribute to be changed to reflect the new chunking scheme. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np Create a test Dataset with dimension x and y, each of size 100, and a chunksize of 50 ds_original = xr.Dataset({"my_var": (["x", "y"], np.random.randn(100, 100))}) Since 'chunk' does not work, manually set encoding ds_original .my_var.encoding["chunks"] = (50, 50) To best showcase the real-life example, write it to file and read it back again. The same could be achieved by just calling .chunk() with chunksizes of 25, but this feels more 'complete' filepath = "~/chunk_test.zarr" ds_original.to_zarr(filepath) ds = xr.open_zarr(filepath) Check the chunksizes and "chunks" encoding print(ds.my_var.chunks) >>> ((50, 50), (50, 50)) print(ds.my_var.encoding["chunks"]) >>> (50, 50) Rechunk the Dataset ds = ds.chunk({"x": 25, "y": 25}) The chunksizes have changed print(ds.my_var.chunks) >>> ((25, 25, 25, 25), (25, 25, 25, 25)) But the encoding value remains the same print(ds.my_var.encoding["chunks"]) >>> (50, 50) Attempting to write this back to zarr raises an error ds.to_zarr("~/chunk_test_rechunked.zarr") NotImplementedError: Specified zarr chunks encoding['chunks']=(50, 50) for variable named 'my_var' would overlap multiple dask chunks ((25, 25, 25, 25), (25, 25, 25, 25)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`. ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.16.3-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.7 libnetcdf: 4.8.1 xarray: 2023.7.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.12.0 h5py: 3.6.0 Nio: None zarr: 2.14.1 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.6 dask: 2022.01.0+dfsg distributed: 2022.01.0+ds.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.1.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 59.6.0 pip: 23.2.1 conda: None pytest: 7.2.2 mypy: 1.1.1 IPython: 7.31.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8062/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
1845508562	I_kwDOAMm_X85uADnS	8065	.mfdataset fail to open a kerchunked zarr file from an object-store bucket	pl-marasco 22492773	closed	0	4	2023-08-10T16:22:05Z	2023-08-14T14:18:17Z	2023-08-14T14:13:58Z	NONE	What happened? Trying to open a kerchunk .json through the open_mfdata a ValueError is raised. What did you expect to happen? should be open a Dataset as described here below: <xarray.Dataset> Dimensions: (lat: 15680, lon: 40320, time: 36) Coordinates: * lat (lat) float64 80.0 79.99 79.98 79.97 ... -59.97 -59.98 -59.99 * lon (lon) float64 -180.0 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0 * time (time) float64 nan 1.0 2.0 3.0 4.0 5.0 ... 31.0 32.0 33.0 34.0 35.0 Data variables: crs object ... max (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> mean (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> median (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> min (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> nobs (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> stdev (time, lat, lon) float32 dask.array<chunksize=(1, 1207, 3102), meta=np.ndarray> Attributes: (12/19) Conventions: CF-1.6 archive_facility: VITO copyright: Copernicus Service information 2021 history: 2021-03-01 - Processing line NDVI LTS identifier: urn:cgls:global:ndvi_stats_all:NDVI-LTS_1999-2019-0... institution: VITO NV ... ... references: https://land.copernicus.eu/global/products/ndvi sensor: VEGETATION-1, VEGETATION-2, VEGETATION source: Derived from EO satellite imagery time_coverage_end: 2019-12-31T23:59:59Z time_coverage_start: 1999-01-01T00:00:00Z title: Normalized Difference Vegetation Index: Long Term S... Minimal Complete Verifiable Example `python import xarray as xr catalogue="https://object-store.cloud.muni.cz/swift/v1/foss4g-catalogue/c_gls_NDVI-LTS_1999-2019.json" LTS = xr.open_mfdataset( "reference://", engine="zarr", backend_kwargs={ "storage_options": { "fo":catalogue }, "consolidated": False } )` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python `ValueError: Cannot specify both fs and storage_options` Anything else we need to know? Seems to be related to zarr's version: if tested with <= 2.12 it works but with the latest versions > 2.12 it doesn't. Environment xarray version 2023.7.0 zarr >2.12	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8065/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1817880272	I_kwDOAMm_X85sWqbQ	8013	np.cumproduct deprecated	quantsnus 25102059	closed	0	4	2023-07-24T08:11:01Z	2023-07-31T16:46:00Z	2023-07-31T16:46:00Z	CONTRIBUTOR	What is your issue? Since numpy version 1.25.0 `np.cumproduct` is deprecated in favor of `np.cumprod`. The coordinates to_index() method still uses it https://github.com/pydata/xarray/blob/971be103d6376d6572d1f12d32526f12f07ae2c7/xarray/core/coordinates.py#L144 which results in an unecessary DeprecationWarning.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8013/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1789989152	I_kwDOAMm_X85qsREg	7962	Better chunk manager error	dcherian 2448579	closed	0	4	2023-07-05T17:27:25Z	2023-07-24T22:26:14Z	2023-07-24T22:26:13Z	MEMBER	What happened? I just ran in to this error in an environment without dask. `TypeError: Could not find a Chunk Manager which recognises type <class 'dask.array.core.Array'>` I think we could easily recommend the user to install a package that provides `dask` by looking at `type(array).__name__`. This would make the message a lot friendlier	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7962/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1752520008	I_kwDOAMm_X85odVVI	7907	`plot.scatter(hue_style="discrete")` does nothing	mgunyho 20118130	closed	0	4	2023-06-12T11:21:33Z	2023-07-13T23:17:49Z	2023-07-13T23:17:49Z	CONTRIBUTOR	What happened? I was trying to do a scatterplot of my data with one dimension determining the color. The dimension has only a few values so I used `hue_style="discrete"` to have a different color for each value. However, the resulting scatterplot has a continuous colorbar, which is the same as when I pass `hue_style="continuous"`: What did you expect to happen? The colorbar should have discrete colors. I was also expecting the colors to be from the default matplotlib color palette, C0, C1, etc, when there's less than 10 items, like this: Although the examples in the documentation show the discrete case also using viridis. What I was really expecting is a plot like one would get by passing `add_colorbar=False, add_legend=True`: But that may be a bit too automagical. Minimal Complete Verifiable Example ```Python import matplotlib.pyplot as plt import numpy as np import xarray as xr x = xr.DataArray( np.random.default_rng().random((10, 3)), coords=[ ("idx", np.linspace(0, 1, 10)), ("color", [1, 2, 3]), ] ) y = x + np.random.default_rng().random(x.shape) ds = xr.Dataset({ "x": x, "y": y, }) the output is the same regardless of hue_style="discrete" or "continuous" or just leaving it out ds.plot.scatter(x="x", y="y", hue="color", hue_style="discrete", ax=plt.figure().gca()) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? This is the code for the "expected" plot: ```python from matplotlib.colors import ListedColormap ds.plot.scatter( x="x", y="y", hue="color", hue_style="discrete", ax=plt.figure().gca(), `# these lines added in addition to the MVCE cmap=ListedColormap(["C0", "C1", "C2"]), vmin=0.5, vmax=3.5, cbar_kwargs=dict(ticks=ds.color.data),` ) ``` Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.14.0-1059-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.1.0 pandas: 1.4.3 numpy: 1.23.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 44.0.0 pip: 20.0.2 conda: None pytest: None mypy: None IPython: 8.12.2 sphinx: None I also tried this on main at 3459e6fa, the behavior is the same.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7907/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1775657305	I_kwDOAMm_X85p1mFZ	7945	engine='cfgrib' no longer an option in xr.open_dataset() but works anyway	parsellsx 74011857	closed	0	4	2023-06-26T21:32:01Z	2023-06-27T00:06:27Z	2023-06-26T21:37:05Z	NONE	What is your issue? Looking at the documentation for xr.open_dataset(), the "engine" argument to that function is listed as accepting one of 7 different engines (or None), but the "cfgrib" engine is not among them. Looking at older versions of the documentation, I see that "cfgrib" was delisted starting with v2023.04.0 (it's still present in v2023.03.0). In what I think is a related issue, this tutorial on reading in ERA5 GRIB files with the "engine='cfgrib'" option on xr.load_dataset() gives a ValueError in documentation versions starting with v2023.04.0 and going through v2023.05.0 and 'stable' due to the unrecognized engine 'cfgrib', although it seems to have been fixed for v2023.06.0 and 'latest'. Given both of the above, I was surprised to find that using xr.open_dataset() on a GRIB file with engine='cfgrib' does work for me using xarray v2023.05.0. To me it seems that the documentation for xr.open_dataset() should be edited to include the 'cfgrib' option again, but I'd like to get an opinion from someone more familiar with xarray.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7945/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1718143526	I_kwDOAMm_X85maMom	7854	Freezing Issue When Accessing Precipitation Values with xarray	yanivgolds 118670091	closed	0	4	2023-05-20T11:30:54Z	2023-06-26T15:33:19Z	2023-06-26T15:33:19Z	NONE	What is your issue? I am encountering a freezing issue in my project that utilizes xarray when trying to access precipitation values for a specific longitude-latitude position over a time period. This issue occurs on the slurm system but is not reproduced on my Jupyter Notebook setup. As a result, whenever I attempt to run the project, the job freezes. I would greatly appreciate your assistance in determining the cause of this problem. Below is a figure showing the result from Jupyer Notebook (this works):	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7854/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1691902604	I_kwDOAMm_X85k2GKM	7805	[FR] add support for rss and rss button to xarray blog	danieltomasz 7980381	closed	0	4	2023-05-02T07:15:12Z	2023-06-21T21:10:32Z	2023-06-21T21:10:32Z	NONE	Is your feature request related to a problem? A easy way to subscribe to news from xarray blog Describe the solution you'd like A support for publishing news and button to subscribe to rss from blog (along twitter icon etcera) Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7805/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1760733017	I_kwDOAMm_X85o8qdZ	7924	Migrate from nbsphinx to myst, myst-nb	dcherian 2448579	open	0	4	2023-06-16T14:17:41Z	2023-06-20T22:07:42Z		MEMBER	Is your feature request related to a problem? I think we should switch to MyST markdown for our docs. I've been using MyST markdown and MyST-NB in docs in other projects and it works quite well. Advantages: 1. We get HTML reprs in the docs (example) which is a big improvement. (#6620) 2. I think many find markdown a lot easier to write than RST There's a tool to migrate RST to MyST (RTD's migration guide). Describe the solution you'd like No response Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7924/reactions", "total_count": 5, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1722614979	I_kwDOAMm_X85mrQTD	7870	Name collision with Pulsar Timing package 'PINT'	vhaasteren 3092444	closed	0	4	2023-05-23T18:54:18Z	2023-05-26T16:19:37Z	2023-05-26T16:19:37Z	CONTRIBUTOR	What is your issue? In the astrophysics community of pulsar timers, there is an analysis package called `PINT`. PINT is widely used in that community. As you can see on their github, they have been aware of the name collision and on pip/conda the package is available as `pint-pulsar`. This has not been a problem so far, because most if not all astrophysicists use the great astropy to keep track of units where necessary. However, Bayesian modeling through PyMC is becoming more and more popular, meaning that arviz and xarray are now getting installed alongside pint-pulsar, giving obvious issues. A very simple workaround would be to change line 37 in https://github.com/pydata/xarray/blob/main/xarray/core/pycompat.py to something like: `except (ImportError, AttributeError):` This means that `pint-pulsar` would get imported through `mod`), and the `AttributeError` gets caught, and all should be well. It fits the design of duck-typing, since the package doesn't Quack like pint should. Would xarray be willing to accommodate the pulsar timing community this way? As you are all aware, changing the name of a package that is integral in projects with many dependencies is kind of painful. EDIT: fixed typo	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7870/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1160309381	I_kwDOAMm_X85FKOqF	6335	ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4'].	morestart 35556811	closed	0	4	2022-03-05T10:26:49Z	2023-05-12T14:09:52Z	2022-03-05T10:28:29Z	NONE	What is your issue? ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. Consider explicitly selecting one of the installed engines via the `engine` parameter, or installing additional IO dependencies, see: but i installed nedCDF4 use pip install netCDF4	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6335/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1517575123	I_kwDOAMm_X85adFvT	7409	Implement `DataArray.to_dask_dataframe()`	gcaria 44147817	closed	0	4	2023-01-03T15:44:11Z	2023-04-28T15:09:31Z	2023-04-28T15:09:31Z	CONTRIBUTOR	Is your feature request related to a problem? It'd be nice to pass from a chunked DataArray to a dask object directly Describe the solution you'd like I think something along these lines should work (although a less convoluted way might exist): ```python import dask.dataframe as dkd import xarray as xr def to_dask(da: xr.DataArray) -> Union[dkd.Series, dkd.DataFrame]: `if da.data.ndim > 2: raise ValueError(f"Can only convert 1D and 2D DataArrays, found {da.data.ndim} dimensions") indexes = [da.get_index(dim) for dim in da.dims] darr_index = dka.from_array(indexes[0], chunks=da.data.chunks[0]) columns = [da.name] if da.data.ndim == 1 else indexes[1] ddf = dkd.from_dask_array(da.data, columns=columns) ddf[indexes[0].name] = darr_index return ddf.set_index(indexes[0].name).squeeze()` ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7409/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1652227927	I_kwDOAMm_X85iev9X	7713	`Variable/IndexVariable` do not accept a tuple for data.	zoj613 44142765	closed	0	4	2023-04-03T14:50:58Z	2023-04-28T14:26:37Z	2023-04-28T14:26:37Z	NONE	What happened? It appears that `Variable` and `IndexVariable` do not accept a tuple for the `data` parameter even though the docstring suggests it should be able to accept `array_like` objects (tuple falls under this type of object, right?). What did you expect to happen? Successful instantiation of a `Variable/IndexVariable` object, but instead a `ValueError` exception is raised. Minimal Complete Verifiable Example ```Python import xarray as xr xr.Variable(data=(2, 3, 45), dims="day") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output `Python ValueError: dimensions ('day',) must have the same length as the number of data dimensions, ndim=0` Anything else we need to know? This error seems to be triggered by the `self._parse_dimensions(dims)` call inside the `Variable` class. This problem does not happen if I use a list. But I find it strange that the `array_like` data specifically needs to be a certain type of object for the call to work. Maybe if it has to be a list then the docstring should reflect that. Environment ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.16 \| packaged by conda-forge \| (default, Feb 1 2023, 16:01:55) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 6.1.21-1-lts machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2023.1.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.2 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2023.3.2 distributed: 2023.3.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 67.6.1 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: 1.1.1 IPython: 8.12.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7713/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
575939446	MDU6SXNzdWU1NzU5Mzk0NDY=	3830	Documentation request: add examples for carrying out "ncecat" in xarray	lukelbd 19657652	open	0	4	2020-03-05T01:58:17Z	2023-04-13T20:06:20Z		NONE	In climate science, a very common task involves concatenating NetCDF files with identical variables, dimensions, and coordinates along a brand new "ensemble member" or "record" dimension. With the NetCDF Operators, this is accomplished using `ncecat`. MCVE Code Sample Currently, it seems the correct way to do this in xarray is with `xarray.combine_nested` as follows: `python import xarray as xr files = ['member1.nc', 'member2.nc', ...] ds = xr.open_mfdataset( files, combine='nested', concat_dim='record', )` Problem Description While this works, there does not seem to be any mention of this use case in the `combine_nested` or `open_mfdataset` docs... and using `combine='nested'` to concatenate along a brand new dimension feels quite unintuitive to me. It would be nice to have examples in `combine_nested` and/or `open_mfdataset` with this special usage or mention the possibility of creating brand new dimensions with `concat_dim`. For example: `python In [1]: import xarray as xr ...: datasets = [ ...: xr.Dataset({'temp': (('x', 'y'), np.random.rand(10, 20))}) ...: for i in range(3) ...: ] ...: xr.combine_nested(datasets, concat_dim='record') Out[1]: <xarray.Dataset> Dimensions: (record: 3, x: 10, y: 20) Dimensions without coordinates: record, x, y Data variables: temp (record, x, y) float64 0.32 0.4897 0.2659 ... 0.3485 0.0251 0.399` Output of `xr.show_versions()` n/a	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3830/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1659786592	I_kwDOAMm_X85i7lVg	7742	About save char into netcdf	ChristmasZCY 61818189	closed	0	4	2023-04-09T07:49:50Z	2023-04-11T06:36:27Z	2023-04-11T06:36:27Z	NONE	What is your issue? When I want to save char into netcdf, it will produce a new dimension. However I read this netcdf file with xarray, it can't find anything with this dimension. ∫	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7742/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1419825696	I_kwDOAMm_X85UoNIg	7199	Deprecate cfgrib backend	headtr1ck 43316012	closed	0	4	2022-10-23T15:09:14Z	2023-03-29T15:19:53Z	2023-03-29T15:19:53Z	COLLABORATOR	What is your issue? Since cfgrib 0.9.9 (04/2021) it comes with its own xarray backend plugin (looks mainly like a copy of our internal version). We should deprecate our internal plugin. The deprecation is complicated since we usually bind the minimum version to a minor step, but cfgrib seems to be on 0.9 since 4 years already. Maybye an exception like for netCDF4? Anyway, if we decide to leave it as it is for now, this ticket is just a reminder to remove it someday :)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7199/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1620573171	I_kwDOAMm_X85gl_vz	7617	The documentation contains some non-descriptive link texts.	remigathoni 51911758	closed	0	4	2023-03-13T00:34:09Z	2023-03-27T21:37:21Z	2023-03-27T21:37:20Z	CONTRIBUTOR	What is your issue? I've been going through the docs and noticed some links could be more descriptive. Here are a few examples with options on how we could rewrite them: - See the user guide for more. -> Check out the indexing section in the user guide for a detailed explanation. - For more, see the Xarray documentation. -> See the documentation on automatic alignment to learn more. - This tutorial notebook also covers alignment and broadcasting (highly recommended)-> You can also check out this tutorial notebook on alignment and broadcasting (highly recommended). - For more see the user guide, the gallery, and the tutorial material. -> For more information, check out the following resources: * The plotting documentation in the user guide. * The visualization gallery. * The plotting and visualization tutorial materials. With more specific link texts, you get a clearer idea of what to expect when you click on the link which improves the reading experience. It also makes the links more accessible.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7617/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
928381010	MDU6SXNzdWU5MjgzODEwMTA=	5515	NetCDF: Attempting netcdf-4 operation on netcdf-3 file	mickaellalande 20254164	open	0	4	2021-06-23T15:23:55Z	2023-03-27T21:07:32Z		CONTRIBUTOR	I'm trying to open MODIS .hdf files, but I get the error : `NetCDF: Attempting netcdf-4 operation on netcdf-3 file`. Does anyone knows how to open that files? (https://nsidc.org/data/MOD10C1) ```python import xarray as xr xr.open_dataset('MOD10C1.A2000055.061.2020037182124.hdf') RuntimeError: NetCDF: Attempting netcdf-4 operation on netcdf-3 file ``` I already opened hdf files from another product without any issue... (https://nsidc.org/data/MOD10CM) Here are two examples, with one that works and the other one that causes the issue: MODIS.zip Thanks in advance for your help! Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 \| packaged by conda-forge \| (default, Jul 24 2020, 01:25:15) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.0 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.5 cfgrib: 0.9.8.5 iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None pint: None setuptools: 49.2.0.post20200712 pip: 20.2 conda: None pytest: 6.0.0 IPython: 7.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5515/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1338173609	I_kwDOAMm_X85Pwuip	6914	plt.imshow() vs xarray_dataset.plot.imshow() not rendering correctly \| Potential Bug	melioristic 32569566	closed	0	4	2022-08-14T08:40:56Z	2023-03-22T20:46:23Z	2023-03-22T20:46:23Z	NONE	What is your issue? I have 2d data which I want to visualise. The visuals look completely different if I use plt.imshow() vs xarray_dataset.plot.imshow() There are mainly two issues - First, the array is flipped. (I think this is manageable but inconsistent) - Secondly, the plots don't look correct. This can be best illustrated by the figures themselves. For example this is the xarray code I am using. `day_data.plot.imshow(cmap= "Blues", vmin =1, vmax = 100) plt.show()` And this is the image that I get. Secondly, when I use the matplotlib to plot the values. `plt.imshow(day_data.values, vmin = 1, vmax = 100, cmap = 'Blues') plt.show()` I get this plot. Since it is a discharge data I would expect to see the second plot. Can someone tell me what is the issue here? P.S. This is how day_data looks like. xarray.DataArray'dis06'y: 950x: 1000 array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], dtype=float32) Coordinates: time () datetime64[ns] 2019-10-24T06:00:00 step () timedelta64[ns] 06:00:00 surface () float64 0.0 latitude (y, x) float64 ... longitude (y, x) float64 ... valid_time () datetime64[ns] 2019-10-24T12:00:00 Attributes: GRIB_paramId : 240023 GRIB_dataType : sfo GRIB_numberOfPoints : 950000 GRIB_typeOfLevel : surface GRIB_stepUnits : 1 GRIB_stepType : avg GRIB_gridType : lambert_azimuthal_equal_area GRIB_NV : 0 GRIB_cfName : unknown GRIB_cfVarName : dis06 GRIB_gridDefinitionDescription : Lambert azimuthal equal area projection GRIB_missingValue : 9999 GRIB_name : Mean discharge in the last 6 hours GRIB_shortName : dis06 GRIB_units : m3 s-1 long_name : Mean discharge in the last 6 hours units : m3 s-1 standard_name : unknown	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6914/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1499473190	I_kwDOAMm_X85ZYCUm	7385	Unexpected NaNs in broadcast	dopplershift 221526	open	0	4	2022-12-16T02:42:44Z	2023-03-14T20:43:00Z		CONTRIBUTOR	What happened? When running the `broadcast` in the sample code, I end up with `nan` in the output when there are not any in the original source array. While I know the construction is really odd (this came from user-submitted code), I'm shocked that it resulted in `nan`s the resulting broadcasted data and honestly assumed MetPy's code was doing something dumb for quite awhile. I would have expected (regardless of the nature of the coordinates) that the result for `broad_a` be `[[1, 2], [1, 2]]`. What did you expect to happen? No response Minimal Complete Verifiable Example ```Python levs = np.array([100000, 85000]) a = xr.Dataset({'a': (('lev',), [1, 2])}, coords={'lev': levs}).to_array() b = xr.Dataset({'b': (('lev',), [3, 4])}, coords={'lev': levs}).to_array() broad_a, broad_b = xr.broadcast(a, b) print(broad_a) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output `Python <xarray.DataArray (variable: 2, lev: 2)> array([[ 1., 2.], [nan, nan]]) Coordinates: * lev (lev) int64 100000 85000 * variable (variable) object 'a' 'b'` Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.8 \| packaged by conda-forge \| (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.9.3 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.3 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.10.3 iris: None bottleneck: 1.3.5 dask: 2022.6.1 distributed: 2022.6.1 matplotlib: 3.6.2 cartopy: 0.21.0 seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: 0.20.1 sparse: None flox: None numpy_groupies: None setuptools: 65.5.1 pip: 22.3.1 conda: None pytest: 7.2.0 mypy: 0.991 IPython: 8.7.0 sphinx: 5.3.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7385/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
706507153	MDU6SXNzdWU3MDY1MDcxNTM=	4449	Did copy(deep=True) break with 0.16.1?	blaylockbk 6249613	closed	0	4	2020-09-22T15:59:41Z	2023-03-12T21:08:42Z	2023-03-12T21:08:42Z	NONE	What happened: I have a script that downloads a file, reads and copies it to memory with `ds.copy(deep=True)`, and then removes the downloaded file from disk. In 0.16.1, I get an error "No such file or directory" when I try to read the data from the deep-copied Dataset as if the Dataset was not actually copied into memory. What you expected to happen: In 0.16.0 and earlier, the variable data is available (`ds.varName.data`) after it is copied into memory even after the original file was removed. But this doesn't work anymore in 0.16.1. Minimal Complete Verifiable Example: ```python import xarray as xr import os import urllib.request Get sample NetCDF file url = 'https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc' FILE = 'tos_O1_2001-2002.nc' urllib.request.urlretrieve(url, FILE) Open the NetCDF file ds1 = xr.open_dataset(FILE) Make a copy of the Dataset ds2 = ds1.copy(deep=True) and close the original ds1.close() remove the NetCDF file os.remove(FILE) Read the copied dataset ds2 ``` Anything else we need to know?: Output for xarray v0.16.0 Output for xarray v0.16.1 `FileNotFoundError: [Errno 2] No such file or directory: ...tos_O1_2001-2002.nc'` Environment: Output of <tt>xr.show_versions()</tt> for xarray 0.16.0 INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 \| packaged by conda-forge \| (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: None Output of <tt>xr.show_versions()</tt> for xarray 0.16.1 INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 \| packaged by conda-forge \| (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: Nonexarray: 0.16.0 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4449/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1598266728	I_kwDOAMm_X85fQ51o	7556	broken documentation link	arfriedman 76110149	closed	0	4	2023-02-24T09:37:57Z	2023-03-12T18:02:59Z	2023-03-12T18:02:59Z	CONTRIBUTOR	What is your issue? Hi, I found this broken link at the bottom of the Datetime Indexing subsection in the User Guide.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7556/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1468838643	I_kwDOAMm_X85XjLLz	7336	Instability when calculating standard deviation	ShihengDuan 26401994	closed	0	4	2022-11-29T23:33:55Z	2023-03-10T20:32:51Z	2023-03-10T20:32:50Z	NONE	What happened? I noticed that for some large values (not really that large) and lots of samples, the `data.std()` yields different values than `np.std(data)`. This seems to be related to the magnitude. See attached code here: `nino34_tas_picontrol_detrend = nino34_tas_picontrol-298 std_dev = nino34_tas_picontrol_detrend.std() print(std_dev.data) std_dev = nino34_tas_picontrol.std() print(std_dev.data) nino34_tas_picontrol_detrend = nino34_tas_picontrol-10 std_dev = nino34_tas_picontrol_detrend.std() print(std_dev.data)` and the results are: `1.4448999166488647 24.911161422729492 20.054718017578125` So I guess this is related to the magnitude, but not sure. Anyone has similar issue? What did you expect to happen? Adding or subtracting a constant should not change the standard deviation. See screenshot here about what the data look like: Minimal Complete Verifiable Example No response MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.71.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.4 numpy: 1.22.3 scipy: 1.8.1 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.9.0 distributed: 2022.9.0 matplotlib: 3.5.2 cartopy: 0.21.0 seaborn: None numbagg: None fsspec: 2022.10.0 cupy: None pint: None sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.2.2 conda: None pytest: None IPython: 8.6.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7336/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1588461863	I_kwDOAMm_X85ergEn	7539	Concat doesn't concatenate dimension coordinates along new dims	TomNicholas 35968931	open	0	4	2023-02-16T22:32:33Z	2023-02-21T19:07:48Z		MEMBER	What is your issue? `xr.concat` doesn't concatenate dimension coordinates along new dimensions, which leads to pretty unintuitive behavior. Take this example (motivated by https://github.com/pydata/xarray/discussions/7532#discussioncomment-4988792) `python segments = [] for i in range(2): time = np.sort(np.random.random(4)) da = xr.DataArray( np.random.randn(4,2), dims=["time", "cols"], coords=dict(time=('time', time), cols=["col1", "col2"]), ) segments.append(da)` python In [86]: segments Out[86]: [<xarray.DataArray (time: 4, cols: 2)> array([[-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]]) Coordinates: * time (time) float64 0.1048 0.168 0.869 0.9432 * cols (cols) <U4 'col1' 'col2', <xarray.DataArray (time: 4, cols: 2)> array([[ 0.90266408, -0.54294821], [-1.09087103, -0.17484417], [-0.21679558, -0.57377412], [ 0.07570151, 0.27433728]]) Coordinates: * time (time) float64 0.03627 0.09754 0.2434 0.592 * cols (cols) <U4 'col1' 'col2'] ```python In [85]: xr.concat(segments, dim='new') Out[85]: <xarray.DataArray (new: 2, time: 8, cols: 2)> array([[[ nan, nan], [ nan, nan], [-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [ nan, nan], [ nan, nan], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]], `[[ 0.90266408, -0.54294821], [-1.09087103, -0.17484417], [ nan, nan], [ nan, nan], [-0.21679558, -0.57377412], [ 0.07570151, 0.27433728], [ nan, nan], [ nan, nan]]])` Coordinates: * time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432 * cols (cols) <U4 'col1' 'col2' Dimensions without coordinates: new ``` I would have expected to get a result of size `{new: 2, time: 4, cols: 2}`. That would be intuitive, because the default is `coords='different'`, and that would be the result of concatenating each `time` coordinate (which have different values) and just propagating the `cols` coordinate (as they have the same values). Instead what happened is that `xr.concat` treats the dimension coordinates as indexes to align, and defaults to an outer join. This auto-alignment behaviour has been discussed at length before, I'm just trying to point out another place in which its problematic. This is kind of briefly mentioned in the concat docstring under `coords='all'`: `“all”: All coordinate variables will be concatenated, except those corresponding to other dimensions.` but it's not even mentioned under `coords='different'` I don't really know what I would prefer to happen with the coordinates. I guess to have created a `time` coordinate of size `{new: 2, time: 4, cols: 2}`, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts? At the very least we should make this a lot clearer in the docs.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7539/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1470583016	I_kwDOAMm_X85Xp1Do	7340	xr.corr produces incorrect output for complex arrays	mattragoza 7647340	closed	0	4	2022-12-01T03:00:09Z	2023-02-14T16:38:29Z	2023-02-14T16:38:29Z	NONE	What happened? I create a DataArray full of complex numbers, and I compute the correlation of the DataArray with itself. What did you expect to happen? The absolute value of the correlation coefficient should be equal to 1, up to numerical precision. However, this is not the case. The returned correlation coefficient is around 0.26 and change depending on the number of values in the array. Minimal Complete Verifiable Example ```Python import xarray as xr array = xr.DataArray([ -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j, -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j, 0.00000000e+00+0.00000000e+00j, -2.42585590e-02+1.42052459e-02j, -5.53404148e-03+4.60188062e-03j, -4.68829482e-03+4.90179019e-03j, -7.02331258e-03+8.75908673e-03j, -1.31233383e-01+1.86572484e-01j, -4.05137401e-03+6.59972035e-03j, -4.20701822e-03+7.29813816e-03j, -3.56487231e-03+6.51759430e-03j, -3.68077200e-03+7.04388575e-03j, -8.16459981e-02+1.70084145e-01j, -5.11737898e-03+1.98164995e-02j, 6.72772914e-04-7.28110367e-05j, 2.13957504e-03-1.82525995e-03j, 1.60369835e-03-1.54029189e-03j, 8.77788719e-02-8.45568854e-02j, 1.04277417e-01-9.38854749e-02j, 7.58465696e-03-6.07906563e-03j, 8.00776452e-03-5.70470615e-03j, 8.36166252e-03-5.14978313e-03j, 0.00000000e+00+0.00000000e+00j, 0.00000000e+00+0.00000000e+00j, 0.00000000e+00+0.00000000e+00j, 7.26422461e-03+4.40382166e-04j, 4.01364547e-03+1.09269127e-03j, -1.99069471e-01-1.20355081e-01j, 1.56511579e-01+2.59839758e-01j, 9.14046953e-04+5.42262898e-03j, -8.37800782e-04+5.67555708e-03j, -3.36561822e-03+7.50108018e-03j, -4.22682090e-03+5.36279242e-03j, 5.95438564e-02-3.48209841e-02j, -6.77184281e-03+2.10711488e-03j, -4.84293269e-03+3.78698499e-04j, -5.13547723e-03-6.86765713e-04j, 4.48392070e-01+1.54568226e-01j, -3.17412047e-01-2.35431216e-01j, -2.95731737e-03-3.39078899e-03j, -1.95111443e-03-3.77545168e-03j, -2.82719903e-04-1.61393513e-03j, 7.20241467e-04-1.73515565e-03j, -1.96675563e-01-4.42259734e-02j, 0.00000000e+00+0.00000000e+00j, 4.84813452e-03+7.60742077e-03j, 6.31707602e-03+1.51808252e-02j, 2.99277774e-03+1.18667410e-02j, 5.64640060e-04+1.58372118e-02j, -1.74137347e-03+1.70383706e-02j, -5.91398408e-03+2.30008930e-02j, -7.12027831e-03+1.87732435e-02j, 9.30919156e-02-1.65255887e-01j, -2.09716130e-01+2.30490479e-01j, -1.80115101e-02+1.37248240e-02j, -1.85851718e-02+9.23420957e-03j, -1.88459965e-02+5.12854226e-03j, 1.09175874e+00-9.17875627e-02j, -1.63766142e-02-5.32431671e-03j, -1.24749963e-02-9.63714407e-03j, -7.58657222e-03-1.27728267e-02j, -1.99052439e-03-1.35879033e-02j, -5.70595470e-01+2.27742231e+00j, 1.24516564e-02-1.21867738e-02j, 1.82174257e-02-8.67884733e-03j, 2.27204879e-02-3.77097224e-03j, 2.66143091e-02+2.68683768e-03j, 1.06983372e+00+3.19301893e-01j, -6.86033738e-01-4.72910865e-01j, 3.00291320e-02+3.10297521e-02j, 2.22880055e-02+3.45332319e-02j, 1.61724440e-02+4.04122368e-02j, 9.78881043e-03+4.96053678e-02j, -6.51085120e-03+5.27227722e-02j, -1.76752380e-02+5.26095806e-02j, -3.81856382e-02+6.41735764e-02j, 0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j ]) r = np.abs(xr.corr(array, array).item()) assert np.isclose(r, 1.0), r ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python The exact output I get for the self-contained example below is: AssertionError Traceback (most recent call last) Cell In [44], line 46 3 array = xr.DataArray([ 4 -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j, 5 -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j, (...) 43 0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j 44 ]) 45 r = np.abs(xr.corr(array, array).item()) ---> 46 assert np.isclose(r, 1.0), r AssertionError: 0.2664911388214005 Anything else we need to know? Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] Xarray version is '2022.9.0' Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 \| packaged by conda-forge \| (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-193.28.1.el8_2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: 1.9.1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.11.0 distributed: None matplotlib: 3.6.2 cartopy: None seaborn: 0.12.1 numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: None IPython: 8.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7340/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

416 rows where comments = 4 and type = "issue" sorted by updated_at descending

What is your issue?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Create test data

Nearest pulls a date too far away

Adding tolerance for lat long, but also applied to time

Ideally tolerance could accept a dictionary but currently fails

Expected Output

Problem Description

Output of xr.show_versions()

Is your feature request related to a problem?

What is your issue?

Put your MCVE code here

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What is your issue?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Create a new Dataset

Add the x variable, Specify 'x_bnds' as bounds, defined later.

Bounds require an extra dimension equal to number of vertices.

Add the actual bounding values for variable x.

Write to netcdf file

Open the dataset and check x_bnds attrs. units is dropped.

Confirm that units were never written to the file.

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What is your issue?

Open this netcdf file.

If longitude range is [-180, 180], then convert to [0, 360].

Extract data by longitude and latitude.

Select data by range and value of some variables.

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Output of `xr.show_versions()`