id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2276352251,I_kwDOAMm_X86HrmD7,8994,Improving performance of open_datatree,35968931,open,0,,,4,2024-05-02T19:43:17Z,2024-05-03T15:25:33Z,,MEMBER,,,,"### What is your issue?
The implementation of `open_datatree` works, but is inefficient, because it calls `open_dataset` once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330.
We discussed this in the [datatree meeting](https://github.com/pydata/xarray/issues/8747), and my understanding is that concretely we need to:
- [ ] Create an asv benchmark for `open_datatree`, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups.
- [ ] Refactor the [`NetCDFDatastore`](https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L319) class to only create one `CachingFileManager` object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406.
- [ ] Refactor `NetCDF4BackendEntrypoint.open_datatree` to use an implementation that goes through `NetCDFDatastore` without calling the top-level `xr.open_dataset` again.
- [ ] Check the performance of calling `xr.open_datatree` on a netCDF file has actually improved.
It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8994/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2163608564,I_kwDOAMm_X86A9gv0,8802,Error when using `apply_ufunc` with `datetime64` as output dtype,44147817,open,0,,,4,2024-03-01T15:09:57Z,2024-05-03T12:19:14Z,,CONTRIBUTOR,,,,"### What happened?
When using `apply_ufunc` with `datetime64[ns]` as output dtype, code throws error about converting from specific units to generic datetime units.
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray:
return time[:10]
def fn(da: xr.DataArray) -> xr.DataArray:
dim_out = ""time_cp""
return xr.apply_ufunc(
_fn,
da,
da.time,
input_core_dims=[[""time""], [""time""]],
output_core_dims=[[dim_out]],
vectorize=True,
dask=""parallelized"",
output_dtypes=[""datetime64[ns]""],
dask_gufunc_kwargs={""allow_rechunk"": True,
""output_sizes"": {dim_out: 10},},
exclude_dims=set((""time"",)),
)
da_fake = xr.DataArray(np.random.rand(5,5,5),
coords=dict(x=range(5), y=range(5),
time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]')
)).chunk(dict(x=2,y=2))
fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas
fn(da_fake).compute() # same errors as above
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[211], line 1
----> 1 fn(da_fake).compute()
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, **kwargs)
1144 """"""Manually trigger loading of this array's data from disk or a
1145 remote source into memory and return a new array. The original is
1146 left unaltered.
(...)
1160 dask.compute
1161 """"""
1162 new = self.copy(deep=False)
-> 1163 return new.load(**kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, **kwargs)
1119 def load(self, **kwargs) -> Self:
1120 """"""Manually trigger loading of this array's data from disk or a
1121 remote source into memory and return this array.
1122
(...)
1135 dask.compute
1136 """"""
-> 1137 ds = self._to_temp_dataset().load(**kwargs)
1138 new = self._from_temp_dataset(ds)
1139 self._variable = new._variable
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, **kwargs)
850 chunkmanager = get_chunked_array_type(*lazy_data.values())
852 # evaluate all the chunked arrays simultaneously
--> 853 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
855 for k, data in zip(lazy_data, evaluated_data):
856 self.variables[k].data = data
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs)
67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]:
68 from dask.array import compute
---> 70 return compute(*data, **kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
625 postcomputes.append(x.__dask_postcompute__())
627 with shorten_traceback():
--> 628 results = schedule(dsk, keys, **kwargs)
630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.__call__(self, *args, **kwargs)
2369 self._init_stage_2(*args, **kwargs)
2370 return self
-> 2372 return self._call_as_normal(*args, **kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, *args, **kwargs)
2362 vargs = [args[_i] for _i in inds]
2363 vargs.extend([kwargs[_n] for _n in names])
-> 2365 return self._vectorize_call(func=func, args=vargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args)
2444 """"""Vectorized call to `func` over positional `args`.""""""
2445 if self.signature is not None:
-> 2446 res = self._vectorize_call_with_signature(func, args)
2447 elif not args:
2448 res = func()
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args)
2502 outputs = _create_arrays(broadcast_shape, dim_sizes,
2503 output_core_dims, otypes, results)
2505 for output, result in zip(outputs, results):
-> 2506 output[index] = result
2508 if outputs is None:
2509 # did not call the function even once
2510 if otypes is None:
ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas
```
### Anything else we need to know?
_No response_
### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8802/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2270275688,I_kwDOAMm_X86HUaho,8985,update `to_netcdf` docstring to list support for explicit CDF5 writes,9221710,open,0,,,4,2024-04-30T00:41:13Z,2024-04-30T20:48:46Z,,NONE,,,,"### Is your feature request related to a problem?
I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command.
### Describe the solution you'd like
When I write a netcdf file using:
D.to_netcdf( filename )
then ask ncdump to tell me the kind of file I have,
ncdump -k filename
it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command:
nccopy -k cdf5 filename cdf5_filename
the file now works in CAM. Also, the command
ncdump -k cdf5_filename
returns 'cdf5'.
I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command.
### Describe alternatives you've considered
Writing netcdf-4 files from xarray and converting via
nccopy -k cdf5 filename cdf5_filename
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8985/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1389295853,I_kwDOAMm_X85Szvjt,7099,Pass arbitrary options to sel(),4160723,open,0,,,4,2022-09-28T12:44:52Z,2024-04-30T00:44:18Z,,MEMBER,,,,"### Is your feature request related to a problem?
Currently `.sel()` accepts two options `method` and `tolerance`. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes.
It would be also useful for custom indexes to expose their own selection options, e.g.,
- index query optimization like the `dualtree` flag of [sklearn.neighbors.KDTree.query](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree.query)
- k-nearest neighbors selection with the creation of a new ""k"" dimension (+ coordinate / index) with user-defined name and size.
From #3223, it would be nice if we could also pass distinct options values per index.
What would be a good API for that?
### Describe the solution you'd like
Some ideas:
A. Allow passing a tuple `(labels, options_dict)` as indexer value
```python
ds.sel(x=([0, 2], {""method"": ""nearest""}), y=3)
```
B. Expose an `options` kwarg that would accept a nested dict
```python
ds.sel(x=[0, 2], y=3, options={""x"": {""method"": ""nearest""}})
```
Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great.
Any other ideas? Some sort of context manager? Some `Index` specific API?
### Describe alternatives you've considered
The API proposed in #3223 would look great if `method` and `tolerance` were the only accepted options, but less so for arbitrary options.
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7099/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
481761508,MDU6SXNzdWU0ODE3NjE1MDg=,3223,Feature request for multiple tolerance values when using nearest method and sel(),1117224,open,0,,,4,2019-08-16T19:53:31Z,2024-04-29T23:21:04Z,,NONE,,,,"
```python
import xarray as xr
import numpy as np
import pandas as pd
# Create test data
ds = xr.Dataset()
ds.coords['lon'] = np.arange(-120,-60)
ds.coords['lat'] = np.arange(30,50)
ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30')
ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time'])
target_lat = [36.83]
target_lon = [-110]
target_time = [np.datetime64('2019-06-01')]
# Nearest pulls a date too far away
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest')
# Adding tolerance for lat long, but also applied to time
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5)
# Ideally tolerance could accept a dictionary but currently fails
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')})
```
#### Expected Output
A dataset with nearest values to tolerances on each dim.
#### Problem Description
I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed.
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.11.3
pandas: 0.24.1
numpy: 1.15.4
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: 1.5.5
zarr: 2.2.0
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None
dask: 1.1.2
distributed: 1.26.0
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: None
IPython: 7.3.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3223/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2259316341,I_kwDOAMm_X86Gqm51,8965,Support concurrent loading of variables,2448579,open,0,,,4,2024-04-23T16:41:24Z,2024-04-29T22:21:51Z,,MEMBER,,,,"### Is your feature request related to a problem?
Today if users have to concurrently load multiple variables in a DataArray or Dataset, they *have* to use dask.
It struck me that it'd be pretty easy for `.load` to gain an `executor` kwarg that accepts anything that follows the [`concurrent.futures` executor](https://docs.python.org/3/library/concurrent.futures.html) interface, and parallelize this loop.
https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8965/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1250939008,I_kwDOAMm_X85Kj9CA,6646,`dim` vs `dims`,5635139,closed,0,,,4,2022-05-27T16:15:02Z,2024-04-29T18:24:56Z,2024-04-29T18:24:56Z,MEMBER,,,,"### What is your issue?
I've recently been hit with this when experimenting with `xr.dot` and `xr.corr` — `xr.dot` takes `dims`, and `xr.cov` takes `dim`. Because they each take multiple arrays as positional args, kwargs are more conventional.
Should we standardize on one of these?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6646/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1024011835,I_kwDOAMm_X849CS47,5857,"Incorrect results when using xarray.ufuncs.angle(..., deg=True)",1119116,closed,0,,,4,2021-10-12T16:24:11Z,2024-04-28T20:58:55Z,2024-04-28T20:58:54Z,NONE,,,,"
**What happened**:
The `xarray.ufuncs.angle` is broken. From the help docstring one may use option `deg=True` to have the result in degrees instead of radians (which is consistent with `numpy.angle` function). Yet results show that this is not the case. Moreover specifying `deg=True` or `deg=False` leads to the same result with the values in radians.
**What you expected to happen**:
To have the result of `xarray.ufuncs.angle` converted to degrees when option `deg=True` is specified.
**Minimal Complete Verifiable Example**:
```python
# Put your MCVE code here
import numpy as np
import xarray as xr
ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})
Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd))
D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS
if not np.allclose(ds.wd, (D % 360)):
print(f""Issue with angle operation: {D.values%360} instead of {ds.wd.values}"" \
+ f""\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!"")
D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK
if not np.allclose(ds.wd, (D % 360)):
print(f""Issue with angle operation: {D%360} instead of {ds.wd}"" \
+ f""\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!"")
D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK
if not np.allclose(ds.wd, (D % 360)):
print(f""Issue with angle operation: {D%360} instead of {ds.wd}"" \
+ f""\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!"")
```
**Anything else we need to know?**:
Though `xarray.ufuncs` has a deprecated warning stating that the numpy equivalent may be used, this is not true for `numpy.angle`. Example:
```python
import numpy as np
import xarray as xr
ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})
Z = np.exp(1j * np.radians(ds.wd))
print(Z)
print(f""Is Z an XArray? {isinstance(Z, xr.DataArray)}"")
D = np.angle(ds.wd, deg=True)
print(D)
print(f""Is D an XArray? {isinstance(D, xr.DataArray)}"")
```
If this code is run, the result of `numpy.angle(xarray.DataArray)` is not a DataArray object, contrary to other numpy operations (for all versions of xarray I've used). Hence the `xarray.ufuncs.angle` is a great option, if it was not for the current problem.
**Environment**:
No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost).
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-18-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.utf8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 0.19.0
pandas: 1.2.3
numpy: 1.20.2
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.2.0
pip: 21.3
conda: 4.10.3
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5857/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2224036575,I_kwDOAMm_X86EkBrf,8905,Variable doesn't have an .expand_dims method,35968931,closed,0,,,4,2024-04-03T22:19:10Z,2024-04-28T19:54:08Z,2024-04-28T19:54:08Z,MEMBER,,,,"### Is your feature request related to a problem?
`DataArray` and `Dataset` have an `.expand_dims` method, but it looks like `Variable` doesn't.
### Describe the solution you'd like
Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8905/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
590630281,MDU6SXNzdWU1OTA2MzAyODE=,3921,issues discovered by the all-but-dask CI,14808389,closed,0,,,4,2020-03-30T22:08:46Z,2024-04-25T14:48:15Z,2024-02-10T02:57:34Z,MEMBER,,,,"After adding the `py38-all-but-dask` CI in #3919, it discovered a few backend issues:
- `zarr`:
- [x] `open_zarr` with `chunks=""auto""` always tries to chunk, even if `dask` is not available (fixed in #3919)
- [x] `ZarrArrayWrapper.__getitem__` incorrectly passes the indexer's `tuple` attribute to `_arrayize_vectorized_indexer` (this only happens if `dask` is not available) (fixed in #3919)
- [x] slice indexers with negative steps get transformed incorrectly if `dask` is not available https://github.com/pydata/xarray/pull/8674
- `rasterio`:
- ~calling `pickle.dumps` on a `Dataset` object returned by `open_rasterio` fails because a non-serializable lock was used (if `dask` is installed, a serializable lock is used instead)~","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3921/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2243685081,I_kwDOAMm_X86Fu-rZ,8945,netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory,11130776,closed,0,,,4,2024-04-15T13:26:08Z,2024-04-23T21:49:28Z,2024-04-23T15:33:36Z,NONE,,,,"### What is your issue?
Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory).
Here is a minimum working example:
```
times = 100
nlat = 200
nlon = 300
fp = xr.Dataset({""fp"": ([""time"", ""lat"", ""lon""], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={""time"": pd.date_range(start=""2019-01-01T02:00:00"", periods=times, freq=""1H""), ""lat"": np.arange(nlat), ""lon"": np.arange(nlon)})
flux = xr.Dataset({""flux"": ([""time"", ""lat"", ""lon""], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={""time"": [pd.to_datetime(""2019-01-01"")], ""lat"": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), ""lon"": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)})
fp.to_netcdf(""combine_datasets_tests/fp.nc"")
flux.to_netcdf(""combine_datasets_tests/flux.nc"")
fp1 = xr.open_dataset(""combine_datasets_tests/fp.nc"")
flux1 = xr.open_dataset(""combine_datasets_tests/flux.nc"")
```
Then
```
flux1 = flux1.reindex_like(fp1, method=""ffill"", tolerance=None)
```
takes over a minute, while
```
flux1 = flux1.load().reindex_like(fp1, method=""ffill"", tolerance=None)
```
is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this).
Profiling the ""reindex without load"" cell:
```
804936 function calls (804622 primitive calls) in 93.285 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem}
1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride)
6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis)
72656 0.109 0.000 0.109 0.000 utils.py:429()
72656 0.085 0.000 0.136 0.000 utils.py:430()
72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange}
145318 0.048 0.000 0.115 0.000 shape_base.py:370()
2 0.045 0.023 0.046 0.023 indexing.py:1334(__getitem__)
6 0.044 0.007 0.044 0.007 numeric.py:136(ones)
145318 0.044 0.000 0.067 0.000 index_tricks.py:690(__next__)
14 0.033 0.002 0.033 0.002 {built-in method numpy.empty}
145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next}
1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where)
21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects}
145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray}
1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce}
1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask)
18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros}
1 0.000 0.000 0.000 0.000 file_manager.py:226(close)
```
The `getitem` call at the top is from `xarray.backends.netCDF4_.py`, line 114. Because of the jittered coordinates in `flux`, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680.
In my venv, netCDF4 was installed from a wheel with the following versions:
```
netcdf4-python version: 1.6.5
HDF5 lib version: 1.12.2
netcdf lib version: 4.9.3-development
```
This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3.
I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8945/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1664193419,I_kwDOAMm_X85jMZOL,7748,diff('non existing dimension') does not raise exception,4441338,open,0,,,4,2023-04-12T09:29:58Z,2024-04-21T22:31:37Z,,NONE,,,,"### What happened?
Calling xr.DataArray.diff with a non-existing dimension does not raise an exception.
### What did you expect to happen?
An exception to be raised.
### Minimal Complete Verifiable Example
```Python
import xarray as xr; import numpy as np; xr.DataArray(np.arange(10),dims=('a',)).diff('b')
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.0-21-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.6.9
numpy_groupies: 0.9.20
setuptools: 67.6.0
pip: 23.0.1
conda: 23.1.0
pytest: 7.2.2
mypy: 1.1.1
IPython: 8.11.0
sphinx: 6.1.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7748/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2237228079,I_kwDOAMm_X86FWWQv,8927,"Use a neutral format to have lossless interface with JSON, scipp, Astropy, pandas",92333742,open,0,,,4,2024-04-11T08:50:34Z,2024-04-12T14:25:35Z,,NONE,,,,"### Is your feature request related to a problem?
Each tool has a specific structure for processing multidimensional data with the following consequences:
- interfaces dedicated to each tool,
- partially processed data,
- no unified representation of data structures
### Describe the solution you'd like
The proposed format (see [jupyter notebook](https://nbviewer.org/github/loco-philippe/ntv-numpy/blob/main/example/example_ntv_numpy.ipynb), [github repository](https://github.com/loco-philippe/ntv-numpy/blob/main/README.md), [PyPI package](https://pypi.org/project/ntv-numpy/) ) is based on the following principles:
- neutral format available for tabular or multidimensional tools (e.g. Numpy, pandas, xarray, scipp, astropy),
- taking into account a wide variety of data types as defined in [NTV](https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html) format,
- high interoperability: reversible (lossless round-trip) interface with tabular or multidimensional tools,
- reversible and compact JSON format,
- Ease of sharing and exchanging multidimensional and tabular data,
### Describe alternatives you've considered
_No response_
### Additional context
https://github.com/numpy/numpy/issues/12481#issuecomment-2049179803
https://github.com/astropy/astropy/issues/16286
https://github.com/scipp/scipp/issues/3422","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8927/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1959816045,I_kwDOAMm_X8500Gtt,8368,"to_netcdf: Unexpected drop of ""units"" attribute of attached ""bounds""",15173535,open,0,,,4,2023-10-24T18:15:05Z,2024-04-09T11:11:20Z,,NONE,,,,"### What happened?
When writing a Dataset to netcdf, any DataArrays that are linked as bounds through another variables attrs['bounds'] entry, have their (specifically) 'units' attribute dropped inside the written netcdf file.
See example
### What did you expect to happen?
Units attribute to be written to the netcdf file.
### Minimal Complete Verifiable Example
```Python
import numpy as np
import xarray as xr
# Create a new Dataset
ds = xr.Dataset()
# Add the x variable, Specify 'x_bnds' as bounds, defined later.
ds['x'] = xr.DataArray(np.arange(10), dims='x', attrs={'units':'m', 'bounds':'x_bnds'})
# Bounds require an extra dimension equal to number of vertices.
ds['nv'] = xr.DataArray(np.r_[0, 1], dims='nv')
# Add the actual bounding values for variable x.
ds['x_bnds'] = xr.DataArray(np.squeeze(np.dstack([np.arange(10)-0.5, np.arange(10)+0.5])),
dims=['x', 'nv'],
attrs={'test':4, 'units':'m', })
print('Units is attached to the bounds in the dataset before writing', 'units' in ds['x_bnds'].attrs)
# Write to netcdf file
ds.to_netcdf('tmp.nc', format='netcdf4', engine='netcdf4')
# Open the dataset and check x_bnds attrs. units is dropped.
new = xr.open_dataset('tmp.nc')
print(new['x_bnds'].attrs)
# Confirm that units were never written to the file.
!h5dump -d /x_bnds tmp.nc
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.3
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: 7.2.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8368/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2230680765,I_kwDOAMm_X86E9Xy9,8919,Using the xarray.Dataset.where() function takes up a lot of memory,69391863,closed,0,,,4,2024-04-08T09:15:49Z,2024-04-09T02:45:09Z,2024-04-09T02:45:08Z,NONE,,,,"### What is your issue?
My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function.
The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable **ds** takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal.
```
# Open this netcdf file.
ds = xr.open_dataset(track)
# If longitude range is [-180, 180], then convert to [0, 360].
if np.any(ds[var_lon] < 0):
ds[var_lon] = ds[var_lon] % 360
# Extract data by longitude and latitude.
ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) &
(ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3]))
# Select data by range and value of some variables.
for key, value in range_select.items():
ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1]))
for key, value in value_select.items():
ds = ds.where(ds[key].isin(value))
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8919/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2228373305,I_kwDOAMm_X86E0kc5,8915,"Weird behavior of DataSet.where(... , drop=True)",22961670,closed,0,,,4,2024-04-05T16:03:05Z,2024-04-08T09:32:48Z,2024-04-08T09:32:48Z,NONE,,,,"### What happened?
I work with an aircraft emission dataset that is freely available online: [emission dataset](https://zenodo.org/records/10818082)
During my calculations I eventually convert the `DataSet` to a `DataFrame`. My motivation is to avoid unnecessary rows in the DataFrame. Doing some calculations my code returned unexpected results. Eventually I could narrow it down to a `DataSet.where(... , drop=True)` argument I added along the way, which introduces differences in the data. Here are two examples:
**Example 1:** Along some dimensions data points vanished if `drop=True`

**Example 2:** For other dimensions (these?) data points appeared elsewhere if `drop=True`

### What did you expect to happen?
I expect for my calculations to return the same results, regardless of whether drop=True is active or not.
### Minimal Complete Verifiable Example
```Python
!wget ""https://zenodo.org/records/10818082/files/Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc""
import matplotlib.pyplot as plt
import xarray as xr
nc_file = xr.open_dataset('Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc')
fig, axs = plt.subplots(1,2,figsize=(10,4))
nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lon','time')).plot.contour(x='lat',ax=axs[0])
axs[0].set_xlim(-50,90)
axs[0].set_title('With drop=True')
nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lon','time')).plot.contour(x='lat',ax=axs[1])
axs[1].set_xlim(-50,90)
axs[1].set_title('With drop=False')
plt.tight_layout()
plt.show()
fig, axs = plt.subplots(1,2,figsize=(10,4))
nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lat','time')).plot.contour(x='lon',ax=axs[0])
axs[0].set_title('With drop=True')
nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lat','time')).plot.contour(x='lon',ax=axs[1])
axs[1].set_title('With drop=False')
plt.tight_layout()
plt.show()
```
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'ISO8859-1')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2022.11.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: 3.7.0
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 22.3.1
conda: None
pytest: None
IPython: 8.10.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8915/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2206243581,I_kwDOAMm_X86DgJr9,8876,Possible race condition when appending to an existing zarr,157591329,closed,0,,,4,2024-03-25T16:59:52Z,2024-04-03T15:23:14Z,2024-03-29T14:35:52Z,NONE,,,,"### What happened?
When appending to an existing zarr along a dimension (`to_zarr(..., mode='a', append_dim=""x"" ,..)`), if the dask chunking of the dataset to append does not align with the chunking of the existing zarr, the resulting _consolidated_ zarr store may have `NaN`s instead of the actual values it is supposed to have.
### What did you expect to happen?
We would expected that zarr append to have the same behaviour as if we concatenate dataset _in memory_ (using `concat`) and write the whole result on a new zarr store in one go
### Minimal Complete Verifiable Example
```Python
from distributed import Client, LocalCluster
import xarray as xr
import tempfile
ds1 = xr.Dataset({""a"": (""x"", [1., 1.])}, coords={'x': [1, 2]}).chunk({""x"": 3})
ds2 = xr.Dataset({""a"": (""x"", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({""x"": 3})
with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1
for i in range(0, 100):
with tempfile.TemporaryDirectory() as store:
print(store)
ds1.to_zarr(store, mode=""w"") # write first dataset
ds2.to_zarr(store, mode=""a"", append_dim=""x"") # append first dataset
rez = xr.open_zarr(store).compute() # open consolidated dataset
nb_values = rez.a.count().item(0) # count non NaN values
if nb_values != 6:
print(""found NaNs:"")
print(rez.to_dataframe())
break
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
/tmp/tmptg_pe6ox
/tmp/tmpm7ncmuxd
/tmp/tmpiqcgoiw2
/tmp/tmppma1ieo7
/tmp/tmpw5vi4cf0
/tmp/tmp1rmgwju0
/tmp/tmpm6tfswzi
found NaNs:
a
x
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 NaN
```
### Anything else we need to know?
The example code snippet provided here, reproduces the issue.
Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs.
In the example, when `ds1` is first written, since it only contains 2 values along the `x` dimension, the resulting .zarr store have the chunking: `{'x': 2}`, even though we called `.chunk({""x"": 3})`.
Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is _silently_ changed made this issue harder to spot.
However, when we try to append the second dataset `ds2`, that contains 4 values, the `.chunk({""x"": 3})` in the begining splits the dask array into 2 **dask chunks**, but in a way that does not align with **zarr chunks**.
Zarr chunks:
+ chunk1 : `x: [1; 2]`
+ chunk2 : `x: [3; 4]`
+ chunk3 : `x: [5; 6]`
Dask chunks for `ds2`:
+ chunk A: `x: [3; 4; 5]`
+ chunk B: `x: [6]`
Both **dask** chunks A and B, are supposed to write on **zarr** chunk3
And depending on who writes first, we can end up with NaN on `x = 5` or `x = 6` instead of actual values.
The issue obviously happens only when dask tasks are run in parallel.
Using `safe_chunks = True` when calling `to_zarr` does not seem to help.
We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?)
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8876/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2211106929,I_kwDOAMm_X86DytBx,8882,"to_zarr silently loses data when using append_dim, if chunks are different to zarr store",140395181,closed,0,,,4,2024-03-27T15:27:02Z,2024-03-29T14:35:51Z,2024-03-29T14:35:51Z,NONE,,,,"### What happened?
When writing a chunked DataArray to an existing zarr store, appending along an existing dimension of the store, I have found that some data are not written if there are multiple array chunks to one zarr chunk.
I appreciate it is probably bad practice to have different chunksizes in my DataArray and zarr_store, but I think its a realistic scenario that needs to be caught.
This may be related to / the same underlying issue as #8371. Perhaps the checks mentioned in https://github.com/pydata/xarray/issues/8371#issuecomment-1814589157 are somehow getting bypassed? Using zarr's ThreadSynchronizer is the only way I have found to ensure that all the data gets written.
### What did you expect to happen?
I expected that either
- to_zarr would recognise the different chunk sizes, and re-chunk or wait for all the chunks to be written
- or an error would be raised, given that the results result in loss of data in an unpredictable way
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
from matplotlib import pyplot as plt
x_coords = np.arange(10)
y_coords = np.arange(10)
t_coords = np.array([np.datetime64('2020-01-01').astype('datetime64[ns]')])
data = np.ones((10,10))
for i in range(4):
plt.subplot(1,4,i+1)
da = xr.DataArray(data.reshape((-1,10,10)),
dims = ['time','x','y'],
coords = {'x':x_coords, 'y':y_coords, 'time':t_coords},
).chunk({'x':5, 'y':5,'time':1}).rename('foo')
da.to_zarr('foo.zarr', mode='w')
new_time = np.array([np.datetime64('2021-01-01').astype('datetime64[ns]')])
da2 = xr.DataArray(data.reshape((-1,10,10)),
dims = ['time','x','y'],
coords = {'x':x_coords, 'y':y_coords, 'time':new_time},
).chunk({'x':1, 'y':1,'time':1}).rename('foo')
da2.to_zarr('foo.zarr',append_dim='time', mode='a')
plt.imshow(xr.open_zarr('foo.zarr').isel(time=-1).foo.values)
```
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
Output from the plots above:

### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-1041-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: 0.22.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: 0.15.1
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: 24.1.2
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8882/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
935607748,MDU6SXNzdWU5MzU2MDc3NDg=,5563,Decoding non-utf-8 encoded strings with the h5netcdf engine,11391714,closed,0,,,4,2021-07-02T09:49:58Z,2024-03-26T15:08:41Z,2024-03-26T15:08:41Z,NONE,,,,"**What happened**:
Trying to load a netCDF file-like (`io.BytesIO` object) with attribute strings in non-utf-8 encoding with the `h5netcdf` engine leads to `UnicodeDecodeError`.
**What you expected to happen**:
Loading the same file, albeit persisted to disk, with the `netcdf4` engine works fine, however, since the `netcdf4` engine doesnt support the file-like objects I ran into this issue.
**Traceback**:
Traceback (most recent call last):
File """", line 1, in
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py"", line 242, in load_dataset
with open_dataset(filename_or_obj, **kwargs) as ds:
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py"", line 496, in open_dataset
backend_ds = backend.open_dataset(
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 384, in open_dataset
ds = store_entrypoint.open_dataset(
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py"", line 22, in open_dataset
vars, attrs = store.load()
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py"", line 126, in load
attributes = FrozenDict(self.get_attrs())
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 234, in get_attrs
return FrozenDict(_read_attributes(self.ds))
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 75, in _read_attributes
v = maybe_decode_bytes(v)
File ""/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py"", line 63, in maybe_decode_bytes
return txt.decode(""utf-8"")
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import netCDF4
title = b'\xc3'
f = netCDF4.Dataset('test.nc', 'w')
f.title = title
f.close()
xr.load_dataset(""test.nc"", engine=""h5netcdf"")
```
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-136-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.18.1
pandas: 1.2.4
numpy: 1.20.3
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 57.0.0
pip: 21.1.3
conda: None
pytest: 6.2.4
IPython: 7.25.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5563/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2117248281,I_kwDOAMm_X85-MqUZ,8704,Currently no way to create a Coordinates object without indexes for 1D variables,35968931,closed,0,,,4,2024-02-04T18:30:18Z,2024-03-26T13:50:16Z,2024-03-26T13:50:15Z,MEMBER,,,,"### What happened?
The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on `main`, meaning that I think there is currently no way to create an `xr.Coordinates` object without 1D variables being coerced to indexes. This means there is no way to create a `Dataset` object without 1D variables becoming `IndexVariables` being coerced to indexes.
### What did you expect to happen?
I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e.
```python
xr.Coordinates({'x': ('x', uarr)}, indexes={})
```
where `uarr` is an un-indexable array-like.
### Minimal Complete Verifiable Example
```Python
class UnindexableArrayAPI:
...
class UnindexableArray:
""""""
Presents like an N-dimensional array but doesn't support changes of any kind,
nor can it be coerced into a np.ndarray or pd.Index.
""""""
_shape: tuple[int, ...]
_dtype: np.dtype
def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None:
self._shape = shape
self._dtype = dtype
self.__array_namespace__ = UnindexableArrayAPI
@property
def dtype(self) -> np.dtype:
return self._dtype
@property
def shape(self) -> tuple[int, ...]:
return self._shape
@property
def ndim(self) -> int:
return len(self.shape)
@property
def size(self) -> int:
return np.prod(self.shape)
@property
def T(self) -> Self:
raise NotImplementedError()
def __repr__(self) -> str:
return f""UnindexableArray(shape={self.shape}, dtype={self.dtype})""
def _repr_inline_(self, max_width):
""""""
Format to a single line with at most max_width characters. Used by xarray.
""""""
return self.__repr__()
def __getitem__(self, key, /) -> Self:
""""""
Only supports extremely limited indexing.
I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway.
""""""
from xarray.core.indexing import BasicIndexer
if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim):
# no-op
return self
else:
raise NotImplementedError()
def __array__(self) -> np.ndarray:
raise NotImplementedError(""UnindexableArrays can't be converted into numpy arrays or pandas Index objects"")
```
```python
uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32'))
xr.Variable(data=uarr, dims=['x']) # works fine
xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0
```
but in versions after that it triggers the NotImplementedError in `__array__`:
```python
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[59], line 1
----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={})
File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes)
299 variables = {}
300 for name, data in coords.items():
--> 301 var = as_variable(data, name=name)
302 if var.dims == (name,) and indexes is None:
303 index, index_vars = create_default_index_implicit(var, list(coords))
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name)
152 raise TypeError(
153 f""Variable {name!r}: unable to convert object into a variable without an ""
154 f""explicit list of dimensions: {obj!r}""
155 )
157 if name is not None and name in obj.dims and obj.ndim == 1:
158 # automatically convert the Variable into an Index
--> 159 obj = obj.to_index_variable()
161 return obj
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self)
570 def to_index_variable(self) -> IndexVariable:
571 """"""Return this variable as an xarray.IndexVariable""""""
--> 572 return IndexVariable(
573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
574 )
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
2640 # Unlike in Variable, always eagerly load values into memory
2641 if not isinstance(self._data, PandasIndexingAdapter):
-> 2642 self._data = PandasIndexingAdapter(self._data)
File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
1479 from xarray.core.indexes import safe_cast_to_index
-> 1481 self.array = safe_cast_to_index(array)
1483 if dtype is None:
1484 self._dtype = get_valid_numpy_dtype(array)
File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
459 emit_user_level_warning(
460 (
461 ""`pandas.Index` does not support the `float16` dtype.""
(...)
465 category=DeprecationWarning,
466 )
467 kwargs[""dtype""] = ""float64""
--> 469 index = pd.Index(np.asarray(array), **kwargs)
471 return _maybe_cast_to_cftimeindex(index)
Cell In[55], line 63, in UnindexableArray.__array__(self)
62 def __array__(self) -> np.ndarray:
---> 63 raise NotImplementedError(""UnindexableArrays can't be converted into numpy arrays or pandas Index objects"")
NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects
```
### MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
Context is #8699
### Environment
Versions described above
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8704/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
957918751,MDU6SXNzdWU5NTc5MTg3NTE=,5664,Interpolation behaviour inconsistent with numpy? ,7017525,open,0,,,4,2021-08-02T08:56:28Z,2024-03-12T01:15:46Z,,NONE,,,,"Hey all,
When running `dataset.interp(time=dataset.time)` fills with `np.nan` if one of the neighbor is a `np.nan` **even when interpolation is not actually needed**.
Here is the sample code to reproduce the issue :
```python
def test_crop_times_nan() :
ds = xr.Dataset(
data_vars = {
""some_variable"" : (['x', 'time'], np.array([[np.nan, 0, 1]]))
},
coords = {
""time"" : np.array([0,1,2])
}
)
result = ds.interp(time=ds.time)
# result[""some_variable""].value == [nan, nan, 1.0]
# whereas [nan, 0, 1.0] is EXPECTED
xr.testing.assert_allclose(ds, result)
```
Please note that numpy does not have the same behavior :
```python
>>> import numpy as np
>>> np.interp([0,1,2], xp=[0,1,2], fp=[np.nan,0,1])
array([nan, 0., 1.])
```
Is that an intended behaviour for xarray?
If so, does this mean that I first have to check if an interpolation is needed instead of doing it no matter what (and use `reindex` instead of `interp` if it is not needed) ?
(this will be kind of tricky if interpolation is needed for certain values and some not...)
Thanks for your help ;)
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-7642-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.18.2
pandas: 1.2.4
numpy: 1.19.4
scipy: 1.6.0
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.01.0
distributed: 2021.01.0
matplotlib: 3.4.2
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 57.4.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.19.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5664/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2140090923,I_kwDOAMm_X85_jzIr,8759,Passing datasets with different group hierarchy to open_mfdataset ,111437410,closed,0,,,4,2024-02-17T13:31:18Z,2024-03-03T18:43:09Z,2024-03-03T10:53:34Z,NONE,,,,"### Is your feature request related to a problem?
When you want to open multiple datasets located at different nodes of group hierarchy in HDF file, you can't pass a list of group keys ( save_mfdataset offers 'groups' keyword; emphasis on the s). Add to that, the 'files' keyword argument does not accept 'datastore' as a valid input.
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
One, of course, can open_dataset each one in a loop and combine afterwards. One possible fix is to Modify the 'group' argument to accept a list the same length as paths list. Another could be changing ""paths"" keyword to accept datastore or h5py objects. Both are trivial in my opinion. Most of the code is already there in other functions (open_dataset, save_mfdataset).
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8759/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2141899767,I_kwDOAMm_X85_qsv3,8769,Errors started appearing after release v2024.02.0,7112768,closed,0,,,4,2024-02-19T09:23:16Z,2024-02-22T04:54:06Z,2024-02-22T04:54:06Z,NONE,,,,"### What happened?
I started seeing errors in my CI after [latest xarray release](https://github.com/pydata/xarray/releases/tag/v2024.02.0). See, e.g.,
https://github.com/COSIMA/regional-mom6/actions/runs/7957078139/job/21719091616#step:7:226
After I added a [compat for xarray](https://github.com/COSIMA/regional-mom6/pull/98/commits/46ca91d2ac91ab57371f94108f18549aaa7040cf) to preclude the latest release the error went away. See:
https://github.com/COSIMA/regional-mom6/actions/runs/7957192738
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
_No response_
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8769/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2142982259,I_kwDOAMm_X85_u1Bz,8771,Unable to use Xarray to work on RCM Dataset with xsar and safe_rcm by umr-lops,34626942,closed,0,,,4,2024-02-19T18:58:50Z,2024-02-20T05:29:33Z,2024-02-20T05:29:33Z,NONE,,,,"### What happened?
UMR-LOPS has introduced XSAR a library to work with RCM dataset.
when working with the following code
```
import xsar
import geoviews as gv
import holoviews as hv
import geoviews.feature as gf
hv.extension('bokeh')
path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010')
meta = xsar.RcmMeta(name=path)
meta.dt
```
I am encountering the following error
```
ValueError Traceback (most recent call last)
in ()
1 #rs2meta = xsar.RadarSat2Meta(name=path)
----> 2 meta = xsar.RcmMeta(name=path)
14 frames
/usr/local/lib/python3.10/dist-packages/xsar/utils.py in wrapper(*args, **kwargs)
93 startrss = process.memory_info().rss
94 starttime = time.time()
---> 95 result = f(*args, **kwargs)
96 endtime = time.time()
97 if mem_monitor:
/usr/local/lib/python3.10/dist-packages/xsar/rcm_meta.py in __init__(self, name)
32 self.dt = api.open_rcm(name.split(':')[1])
33 else:
---> 34 self.dt = api.open_rcm(name)
35 if not name.startswith('RCM_DS:'):
36 name = 'RCM_DS:%s:' % name
/usr/local/lib/python3.10/dist-packages/safe_rcm/api.py in open_rcm(url, backend_kwargs, manifest_ignores, **dataset_kwargs)
95 )
96
---> 97 tree = read_product(mapper, ""metadata/product.xml"")
98
99 calibration_root = ""metadata/calibration""
/usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in read_product(mapper, product_path)
272 }
273
--> 274 converted = valmap(
275 lambda x: execute(**x)(decoded),
276 layout,
/usr/local/lib/python3.10/dist-packages/toolz/dicttoolz.py in valmap(func, d, factory)
83 """"""
84 rv = factory()
---> 85 rv.update(zip(d.keys(), map(func, d.values())))
86 return rv
87
/usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in (x)
273
274 converted = valmap(
--> 275 lambda x: execute(**x)(decoded),
276 layout,
277 )
/usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
302 def __call__(self, *args, **kwargs):
303 try:
--> 304 return self._partial(*args, **kwargs)
305 except TypeError as exc:
306 if self._should_curry(args, kwargs, exc):
/usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in execute(mapping, f, path)
29 subset = query(path, mapping)
30
---> 31 return compose_left(f, attach_path(path=path))(subset)
32
33
/usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
485
486 def __call__(self, *args, **kwargs):
--> 487 ret = self.first(*args, **kwargs)
488 for f in self.funcs:
489 ret = f(ret)
/usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
487 ret = self.first(*args, **kwargs)
488 for f in self.funcs:
--> 489 ret = f(ret)
490 return ret
491
/usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in (obj)
126 ),
127 lambda obj: obj.set_index({""stacked"": [""pole"", ""pulse""]}),
--> 128 lambda obj: obj.unstack(""stacked""),
129 ),
130 },
/usr/local/lib/python3.10/dist-packages/xarray/util/deprecation_helpers.py in inner(*args, **kwargs)
113 return func(*args[:-n_extra_args], **kwargs)
114
--> 115 return func(*args, **kwargs)
116
117 return inner
/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse)
5576 )
5577 else:
-> 5578 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
5579 return result
5580
/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in _unstack_once(self, dim, index_and_vars, fill_value, sparse)
5395 indexes = {k: v for k, v in self._indexes.items() if k != dim}
5396
-> 5397 new_indexes, clean_index = index.unstack()
5398 indexes.update(new_indexes)
5399
/usr/local/lib/python3.10/dist-packages/xarray/core/indexes.py in unstack(self)
1019
1020 if not clean_index.is_unique:
-> 1021 raise ValueError(
1022 ""Cannot unstack MultiIndex containing duplicates. Make sure entries ""
1023 f""are unique, e.g., by calling ``.drop_duplicates('{self.dim}')``, ""
ValueError: Cannot unstack MultiIndex containing duplicates. Make sure entries are unique, e.g., by calling ``.drop_duplicates('stacked')``, before unstacking.
```
As you can see from the last sections in the trace,the issue is with xarray/dataset.py when we unstack the dataframe.
Any ideas why this is happening.The issue doesn't occur with radarsat 2 or any other dataset.So is this an xarray problem or should I raise the issue at umr-lops?
### What did you expect to happen?
the error shouldn't be there,and I should be able to view the dataframe. as shown in below link
https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/examples/rcm.html
### Minimal Complete Verifiable Example
```Python
import xsar
import geoviews as gv
import holoviews as hv
import geoviews.feature as gf
hv.extension('bokeh')
path = xsar.get_test_file('RCM1_OK1050603_PK1050605_1_SC50MB_20200214_115905_HH_HV_Z010')
meta = xsar.RcmMeta(name=path)
meta.dt
```
### MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
commit: None
python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.1.58+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2023.7.0
pandas: 1.5.3
numpy: 1.25.2
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: 1.3.0
h5py: 3.9.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.1
distributed: 2023.8.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.13.1
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.4.4
mypy: None
IPython: 7.34.0
sphinx: 5.0.2
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn(""Setuptools is replacing distutils."")
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8771/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1912094632,I_kwDOAMm_X85x-D-o,8231,xr.concat concatenates along dimensions that it wasn't asked to,35968931,open,0,,,4,2023-09-25T18:50:29Z,2024-02-14T20:30:26Z,,MEMBER,,,,"### What happened?
Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists).
```python
import xarray as xr
ds1 = xr.Dataset(
coords={
'x_center': ('x_center', [1, 2, 3]),
'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
ds2 = xr.Dataset(
coords={
'x_center': ('x_center', [4, 5, 6]),
'x_outer': ('x_outer', [4.5, 5.5, 6.5]),
},
)
```
Calling `xr.concat` on these with `dim='x_center'` happily concatenates them
```python
xr.concat([ds1, ds2], dim='x_center')
```
```
Dimensions: (x_outer: 7, x_center: 6)
Coordinates:
* x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5
* x_center (x_center) int64 1 2 3 4 5 6
Data variables:
*empty*
```
but notice that the returned result has been concatenated along *both* `x_center` and `x_outer`.
### What did you expect to happen?
I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. `x_outer`).
What I expected to happen was that (as by default `coords='different'`) both variables would be attempted to be concatenated along the `x_center` dimension, which would have succeeded for the `x_center` variable but failed for the `x_outer` variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens:
```python
import xarray as xr
ds1 = xr.Dataset(
data_vars={
'a': ('x_center', [1, 2, 3]),
'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
},
)
ds2 = xr.Dataset(
data_vars={
'a': ('x_center', [4, 5, 6]),
'b': ('x_outer', [4.5, 5.5, 6.5]),
},
)
```
```python
xr.concat([ds1, ds2], dim='x_center', data_vars='different')
```
```
ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4}
```
### Minimal Complete Verifiable Example
_No response_
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
I was trying to create an example for which you would need the automatic combined concat/merge that happens within `xr.combine_by_coords`.
### Environment
xarray `2023.8.0`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8231/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1390228572,I_kwDOAMm_X85S3TRc,7104,Duplicate values on unstack,114576287,closed,0,,,4,2022-09-29T04:16:26Z,2024-02-13T09:48:37Z,2024-02-13T09:48:37Z,NONE,,,,"### What happened?
I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed.
### What did you expect to happen?
A warning or error would be raised to say, ""this isn't going to work"".
### Minimal Complete Verifiable Example
```Python
import datetime as dt
import xarray as xr
ds = xr.DataArray(
[[1, 2, 3], [4, 5, 6]],
dims=(""lat"", ""time""),
coords={""lat"": [-60, 60], ""time"": [dt.datetime(2010, 1, d) for d in range(1, 4)]},
name=""test"",
).to_dataset()
ds = (
ds.assign_coords(
{
""month"": ds[""time""].dt.month,
""year"": ds[""time""].dt.year,
}
)
.set_index(time=[""month"", ""year""])
)
ds = ds.unstack(""time"")
# the output only has 2 values, which isn't what I expected
ds[""test""].data
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that...
### Environment
INSTALLED VERSIONS
------------------
commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7104/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2126375172,I_kwDOAMm_X85-vekE,8726,PRs requiring approval & merging main?,5635139,closed,0,,,4,2024-02-09T02:35:58Z,2024-02-09T18:23:52Z,2024-02-09T18:21:59Z,MEMBER,,,,"### What is your issue?
Sorry I haven't been on the calls at all recently (unfortunately the schedule is difficult for me). Maybe this was discussed there?
PRs now seem to require a separate approval prior to merging. Is there an upside to this? Is there any difference between those who can approve and those who can merge? Otherwise it just seems like more clicking.
PRs also now seem to require merging the latest main prior to merging? I get there's some theoretical value to this, because changes can semantically conflict with each other. But it's extremely rare that this actually happens (can we point to cases?), and it limits the immediacy & throughput of PRs. If the bad outcome does ever happen, we find out quickly when main tests fail and can revert.
(fwiw I wrote a few principles around this down a while ago [here](https://prql-lang.org/book/project/contributing/development.html#merges); those are much stronger than what I'm suggesting in this issue though)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8726/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2115049090,I_kwDOAMm_X85-ERaC,8694,Error while saving an altered dataset to NetCDF when loaded from a file,12544636,open,0,,,4,2024-02-02T14:18:03Z,2024-02-07T13:38:40Z,,NONE,,,,"### What happened?
When attempting to save an altered Xarray dataset to a NetCDF file using the `to_netcdf` method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file.
### What did you expect to happen?
The altered Xarray dataset is saved as a NetCDF file using the `to_netcdf` method.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds = xr.Dataset(
data_vars=dict(
win_1=(""attempt"", [True, False, True, False, False, True]),
win_2=(""attempt"", [False, True, False, True, False, False]),
),
coords=dict(
attempt=[1, 2, 3, 4, 5, 6],
player_1=(""attempt"", [""paper"", ""paper"", ""scissors"", ""scissors"", ""paper"", ""paper""]),
player_2=(""attempt"", [""rock"", ""scissors"", ""paper"", ""rock"", ""paper"", ""rock""]),
)
)
ds.to_netcdf(""dataset.nc"")
ds_from_file = xr.load_dataset(""dataset.nc"")
ds_altered = ds_from_file.where(ds_from_file[""player_1""] == ""paper"", drop=True)
ds_altered.to_netcdf(""dataset_altered.nc"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
Traceback (most recent call last):
File ""example.py"", line 20, in
ds_altered.to_netcdf(""dataset_altered.nc"")
File "".../python3.9/site-packages/xarray/core/dataset.py"", line 2303, in to_netcdf
return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
File "".../python3.9/site-packages/xarray/backends/api.py"", line 1315, in to_netcdf
dump_to_store(
File "".../python3.9/site-packages/xarray/backends/api.py"", line 1362, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "".../python3.9/site-packages/xarray/backends/common.py"", line 356, in store
self.set_variables(
File "".../python3.9/site-packages/xarray/backends/common.py"", line 398, in set_variables
writer.add(source, target)
File "".../python3.9/site-packages/xarray/backends/common.py"", line 243, in add
target[...] = source
File "".../python3.9/site-packages/xarray/backends/scipy_.py"", line 78, in __setitem__
data[key] = value
File "".../python3.9/site-packages/scipy/io/_netcdf.py"", line 1019, in __setitem__
self.data[index] = data
ValueError: could not broadcast input array from shape (4,5) into shape (4,8)
```
### Anything else we need to know?
**Findings:**
The issue is related to the encoding information of the dataset becoming invalid after filtering data with the `where` method. The `to_netcdf` method takes the available encoding information instead of considering the actual shape of the data.
In the provided examples, the maximum length of strings stored in ""player_1"" and ""player_2"" is originally set to 8 characters. However, after filtering with the `where` method, the maximum length of the string becomes 5 in ""player_1"" and remains 8 in ""player_2."". But the encoding information of the variables still shows a length of 8, particularly the attribute `char_dim_name`.
**Workaround:**
A workaround to resolve this issue is to call the `drop_encoding` method on the dataset before saving it with `to_netcdf`. This action ensures that the encoding information is not available, and the `to_netcdf` method is forced to take the actual shapes of the data, preventing the broadcasting error.
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.14 (main, Aug 24 2023, 14:01:46)
[GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.3.1-060301-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2024.1.1
pandas: 2.2.0
numpy: 1.26.3
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 23.3.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8694/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
782440858,MDU6SXNzdWU3ODI0NDA4NTg=,4784,Opening a tiff with scale_factor/add_offset attrs then saving as zarr and opening causes a UFuncTypeError,53100696,closed,0,,,4,2021-01-08T22:45:21Z,2024-02-06T10:40:15Z,2024-02-06T10:40:14Z,NONE,,,,"
**What happened**:
When opening a geotiff that has `scale_factor` and `add_offset` metadata and then saving it as a zarr the `scale_factor` and `add_offset` attributes are [loaded](https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/rasterio_.py#L280) and then saved as strings. When the resulting zarr is opened xarray attempts to [apply](https://github.com/pydata/xarray/blob/569a4da18229aed391886ef768132f3d6d64fb30/xarray/coding/variables.py#L245) the `scale_factor` and `add_offset` attributes, but raises an exception because they are of type ` 220 data *= scale_factor
221 if add_offset is not None:
222 data += add_offset
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-1034-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.2
pandas: 1.2.0
numpy: 1.19.5
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.8
cfgrib: None
iris: None
bottleneck: None
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2112742578,I_kwDOAMm_X8597eSy,8693,reading netcdf with engine=scipy fails with a typeerror under certain conditions,32731672,open,0,,,4,2024-02-01T15:03:23Z,2024-02-05T09:35:51Z,,CONTRIBUTOR,,,,"### What happened?
Saving and loading from netcdf with engine=scipy produces an unexpected valueerror on read. The file seems to be corrupted.
### What did you expect to happen?
reading works just fine.
### Minimal Complete Verifiable Example
```Python
import numpy as np
import xarray as xr
ds = xr.Dataset(
{
""values"": (
[""name"", ""time""],
np.array([[]], dtype=np.float32).T,
)
},
coords={""time"": [1], ""name"": []},
).expand_dims({""index"": [0]})
ds.to_netcdf(""file.nc"", engine=""scipy"")
_ = xr.open_dataset(""file.nc"", engine=""scipy"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
KeyError Traceback (most recent call last)
File .../python3.11/site-packages/xarray/backends/file_manag
er.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
210 try:
--> 211 file = self._cache[self._key]
212 except KeyError:
File .../python3.11/site-packages/xarray/backends/lru_cache.
py:56, in LRUCache.__getitem__(self, key)
55 with self._lock:
---> 56 value = self._cache[key]
57 self._cache.move_to_end(key)
KeyError: [, ('/home/eivind/Projects/ert/file.nc',),
'r', (('mmap', None), ('version', 2)), '264ec6b3-78b3-4766-bb41-7656d6a51962']
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
Cell In[1], line 18
4 ds = (
5 xr.Dataset(
6 {
(...)
15 .expand_dims({""index"": [0]})
16 )
17 ds.to_netcdf(""file.nc"", engine=""scipy"")
---> 18 _ = xr.open_dataset(""file.nc"", engine=""scipy"")
File .../python3.11/site-packages/xarray/backends/api.py:572
, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, d
ecode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked
_array_type, from_array_kwargs, backend_kwargs, **kwargs)
560 decoders = _resolve_decoders_kwargs(
561 decode_cf,
562 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
568 decode_coords=decode_coords,
569 )
571 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None)
--> 572 backend_ds = backend.open_dataset(
573 filename_or_obj,
574 drop_variables=drop_variables,
575 **decoders,
576 **kwargs,
577 )
578 ds = _dataset_from_backend_dataset(
579 backend_ds,
580 filename_or_obj,
(...)
590 **kwargs,
591 )
592 return ds
File .../python3.11/site-packages/xarray/backends/scipy_.py:
315, in ScipyBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, con
cat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, mode, format, group, mm
ap, lock)
313 store_entrypoint = StoreBackendEntrypoint()
314 with close_on_error(store):
--> 315 ds = store_entrypoint.open_dataset(
316 store,
317 mask_and_scale=mask_and_scale,
318 decode_times=decode_times,
319 concat_characters=concat_characters,
320 decode_coords=decode_coords,
321 drop_variables=drop_variables,
322 use_cftime=use_cftime,
323 decode_timedelta=decode_timedelta,
324 )
325 return ds
File .../python3.11/site-packages/xarray/backends/store.py:4
3, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, conca
t_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
29 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs
30 self,
31 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
(...)
39 decode_timedelta=None,
40 ) -> Dataset:
41 assert isinstance(filename_or_obj, AbstractDataStore)
---> 43 vars, attrs = filename_or_obj.load()
44 encoding = filename_or_obj.get_encoding()
46 vars, attrs, coord_names = conventions.decode_cf_variables(
47 vars,
48 attrs,
(...)
55 decode_timedelta=decode_timedelta,
56 )
File .../python3.11/site-packages/xarray/backends/common.py:
210, in AbstractDataStore.load(self)
188 def load(self):
189 """"""
190 This loads the variables and attributes simultaneously.
191 A centralized loading function makes it easier to create
(...)
207 are requested, so care should be taken to make sure its fast.
208 """"""
209 variables = FrozenDict(
--> 210 (_decode_variable_name(k), v) for k, v in self.get_variables().items()
211 )
212 attributes = FrozenDict(self.get_attrs())
213 return variables, attributes
File .../python3.11/site-packages/xarray/backends/scipy_.py:
181, in ScipyDataStore.get_variables(self)
179 def get_variables(self):
180 return FrozenDict(
--> 181 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
182 )
File .../python3.11/site-packages/xarray/backends/scipy_.py:
170, in ScipyDataStore.ds(self)
168 @property
169 def ds(self):
--> 170 return self._manager.acquire()
File .../python3.11/site-packages/xarray/backends/file_manag
er.py:193, in CachingFileManager.acquire(self, needs_lock)
178 def acquire(self, needs_lock=True):
179 """"""Acquire a file object from the manager.
180
181 A new file is only opened if it has expired from the
(...)
191 An open file object, as returned by ``opener(*args, **kwargs)``.
192 """"""
--> 193 file, _ = self._acquire_with_cache_info(needs_lock)
194 return file
File .../python3.11/site-packages/xarray/backends/file_manag
er.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
215 kwargs = kwargs.copy()
216 kwargs[""mode""] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
218 if self._mode == ""w"":
219 # ensure file doesn't get overridden when opened again
220 self._mode = ""a""
File .../python3.11/site-packages/xarray/backends/scipy_.py:
109, in _open_scipy_netcdf(filename, mode, mmap, version)
106 filename = io.BytesIO(filename)
108 try:
--> 109 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version)
110 except TypeError as e: # netcdf3 message is obscure in this case
111 errmsg = e.args[0]
File .../python3.11/site-packages/scipy/io/_netcdf.py:278, i
n netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale)
275 self._attributes = {}
277 if mode in 'ra':
--> 278 self._read()
File .../python3.11/site-packages/scipy/io/_netcdf.py:607, i
n netcdf_file._read(self)
605 self._read_dim_array()
606 self._read_gatt_array()
--> 607 self._read_var_array()
File .../python3.11/site-packages/scipy/io/_netcdf.py:688, i
n netcdf_file._read_var_array(self)
685 data = None
686 else: # not a record variable
687 # Calculate size to avoid problems with vsize (above)
--> 688 a_size = reduce(mul, shape, 1) * size
689 if self.use_mmap:
690 data = self._mm_buf[begin_:begin_+a_size].view(dtype=dtype_)
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.1.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: 0.13.1
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.4.3
pip: 23.3.1
conda: None
pytest: 7.4.4
mypy: 1.8.0
IPython: 8.17.2
sphinx: 7.2.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8693/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2111051033,I_kwDOAMm_X8591BUZ,8691,xarray.open_dataset with chunks={} returns a single chunk and not engine (h5netcdf) preferred chunks,15016780,closed,0,,,4,2024-01-31T22:04:02Z,2024-01-31T22:56:17Z,2024-01-31T22:56:17Z,NONE,,,,"### What happened?
When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine=""h5netcdf"", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047).
A notebook version of the code below includes the output: https://gist.github.com/abarciauskas-bgse/9366e04d2af09b79c9de466f6c1d3b90
### What did you expect to happen?
I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine.
### Minimal Complete Verifiable Example
```Python
#!/usr/bin/env python
# coding: utf-8
# This notebook looks at how xarray and h5netcdf return different chunks.
import pandas as pd
import h5netcdf
import s3fs
import xarray as xr
dates = [
d.to_pydatetime().strftime('%Y%m%d')
for d in pd.date_range('2023-02-01', '2023-03-01', freq='D')
]
SHORT_NAME = 'MUR-JPL-L4-GLOB-v4.1'
s3_fs = s3fs.S3FileSystem(anon=False)
var = 'analysed_sst'
def make_filename(time):
base_url = f's3://podaac-ops-cumulus-protected/{SHORT_NAME}/'
# example file: ""/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc""
return f'{base_url}{time}090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc'
s3_urls = [make_filename(d) for d in dates]
def print_chunk_shape(s3_url):
try:
# Open the dataset using xarray
file = s3_fs.open(s3_url)
dataset = xr.open_dataset(file, engine='h5netcdf', chunks={})
# Print chunk shapes for each variable in the dataset
print(f""\nChunk shapes for {s3_url}:"")
if dataset[var].chunks is not None:
print(f""xarray open_dataset chunks for {var}: {dataset[var].chunks}"")
else:
print(f""xarray open_dataset chunks for {var}: Not chunked"")
with h5netcdf.File(file, 'r') as file:
dataset = file[var]
# Check if the dataset is chunked
if dataset.chunks:
print(f""h5netcdf chunks for {var}:"", dataset.chunks)
else:
print(f""h5netcdf dataset is not chunked."")
except Exception as e:
print(f""Failed to process {s3_url}: {e}"")
[print_chunk_shape(s3_url) for s3_url in s3_urls]
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.198-187.748.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.6.1
distributed: 2023.6.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.0.0
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.14.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8691/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2104267494,I_kwDOAMm_X859bJLm,8677,Add rolling.rank() same as pandas,39230130,open,0,,,4,2024-01-28T17:27:21Z,2024-01-29T19:50:20Z,,NONE,,,,"### Is your feature request related to a problem?
Dear xarray maintainers,
I would like to express my heartfelt gratitude for the significant optimizations your xarray library has brought to my project. Xarray combines the speed of numpy with the highly customizable parameters of pandas. The extensive parameters in the ``rolling`` module have allowed me to achieve functionality similar to pandas more efficiently.
I am wondering if it would be possible to incorporate a ranking method for rolling windows, including the ability to specify parameters such as ``pct``, similar to the pandas ``rolling.rank`` function. Your consideration of this feature would be greatly appreciated.
Once again, thank you for your contributions!

### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8677/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1716228662,I_kwDOAMm_X85mS5I2,7848,Compatibility with the Array API standard ,35968931,open,0,,,4,2023-05-18T20:34:43Z,2024-01-25T04:03:42Z,,MEMBER,,,,"### What is your issue?
**Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other.**
We've already had
- #6804
- #7067
- #7847
and there will likely be many others.
---
I suspect this might require changes to the standard as well as to xarray - in particular see [this list](https://github.com/data-apis/array-api/issues/187) of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ):
- `np.clip`
- `np.diff`
- `np.pad`
- `np.repeat`
- ~`np.take`~
- ~`np.tile`~","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7848/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2079089277,I_kwDOAMm_X8577GJ9,8607,allow computing just a small number of variables,14808389,open,0,,,4,2024-01-12T15:21:27Z,2024-01-12T20:20:29Z,,MEMBER,,,,"### Is your feature request related to a problem?
I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that.
### Describe the solution you'd like
I'd imagine something like
```python
ds.compute(variables=variable_names)
```
but I'm undecided on whether that's a good idea (it might make `.compute` more complex?)
### Describe alternatives you've considered
So far I've been using something like
```python
ds.assign_coords({k: lambda ds: ds[k].compute() for k in variable_names})
ds.pipe(lambda ds: ds.merge(ds[variable_names].compute()))
```
but both are not easy to type / understand (though having `.merge` take a callable would make this much easier). Also, the first option computes variables separately, which may not be ideal?
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2073024461,I_kwDOAMm_X857j9fN,8602,`DataArray.mean()` and `Dataset.mean()` fail with `sparse==0.15.0`,46072231,closed,0,,,4,2024-01-09T19:27:47Z,2024-01-10T14:44:57Z,2024-01-10T14:44:57Z,NONE,,,,"### What happened?
The following script leads to an error:
```
import numpy as np
import xarray as xr
from sparse import GCXS
x = np.random.negative_binomial(1, 0.5, size=(100, 100))
array = xr.DataArray(GCXS.from_numpy(x))
array.mean()
```
```
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 array.mean()
File ~/.../python3.11/site-packages/xarray/core/_aggregations.py:1663, in DataArrayAggregations.mean(self, dim, skipna, keep_attrs, **kwargs)
1588 def mean(
1589 self,
1590 dim: Dims = None,
(...)
1594 **kwargs: Any,
1595 ) -> Self:
1596 """"""
1597 Reduce this DataArray's data by applying ``mean`` along some dimension(s).
1598
(...)
1661 array(nan)
1662 """"""
-> 1663 return self.reduce(
1664 duck_array_ops.mean,
1665 dim=dim,
1666 skipna=skipna,
1667 keep_attrs=keep_attrs,
1668 **kwargs,
1669 )
File ~/.../python3.11/site-packages/xarray/core/dataarray.py:3776, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
3732 def reduce(
3733 self,
3734 func: Callable[..., Any],
(...)
3740 **kwargs: Any,
3741 ) -> Self:
3742 """"""Reduce this array by applying `func` along some dimension(s).
3743
3744 Parameters
(...)
3773 summarized data and the indicated dimension(s) removed.
3774 """"""
-> 3776 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
3777 return self._replace_maybe_drop_dims(var)
File ~/.../python3.11/site-packages/xarray/core/variable.py:1756, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
1749 keep_attrs_ = (
1750 _get_keep_attrs(default=False) if keep_attrs is None else keep_attrs
1751 )
1753 # Noe that the call order for Variable.mean is
1754 # Variable.mean -> NamedArray.mean -> Variable.reduce
1755 # -> NamedArray.reduce
-> 1756 result = super().reduce(
1757 func=func, dim=dim, axis=axis, keepdims=keepdims, **kwargs
1758 )
1760 # return Variable always to support IndexVariable
1761 return Variable(
1762 result.dims, result._data, attrs=result._attrs if keep_attrs_ else None
1763 )
File ~/.../python3.11/site-packages/xarray/namedarray/core.py:772, in NamedArray.reduce(self, func, dim, axis, keepdims, **kwargs)
770 data = func(self.data, axis=axis, **kwargs)
771 else:
--> 772 data = func(self.data, **kwargs)
774 if getattr(data, ""shape"", ()) == self.shape:
775 dims = self.dims
File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:637, in mean(array, axis, skipna, **kwargs)
635 return _to_pytimedelta(mean_timedeltas, unit=""us"") + offset
636 else:
--> 637 return _mean(array, axis=axis, skipna=skipna, **kwargs)
File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:399, in _create_nan_agg_method..f(values, axis, skipna, **kwargs)
396 kwargs.pop(""min_count"", None)
398 xp = get_array_namespace(values)
--> 399 func = getattr(xp, name)
401 try:
402 with warnings.catch_warnings():
AttributeError: module 'sparse' has no attribute 'mean'
```
### What did you expect to happen?
Reproducible script runs without error with `sparse==0.14.0`.
### Minimal Complete Verifiable Example
_No response_
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: None
xarray: 2023.12.0
pandas: 1.5.3
numpy: 1.24.4
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.12.0
distributed: 2023.12.0
matplotlib: 3.8.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.12.0
cupy: None
pint: None
sparse: 0.15.0
flox: None
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.18.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8602/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2041076267,I_kwDOAMm_X855qFor,8551,Make _obj_repr public,12115839,closed,0,,,4,2023-12-14T07:19:16Z,2023-12-21T16:00:52Z,2023-12-21T16:00:52Z,NONE,,,,"### What is your issue?
We are using https://github.com/pydata/xarray/blob/2971994ef1dd67f44fe59e846c62b47e1e5b240b/xarray/core/formatting_html.py#L278
in the html representation of `AreaDefinitions` in https://github.com/pytroll/pyresample and don't like to import private functions.
Would it be OK to make `_obj_repr` public?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8551/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
2027147099,I_kwDOAMm_X854089b,8523,"tree-reduce the combine for `open_mfdataset(..., parallel=True, combine=""nested"")`",2448579,open,0,,,4,2023-12-05T21:24:51Z,2023-12-18T19:32:39Z,,MEMBER,,,,"### Is your feature request related to a problem?
When `parallel=True` and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the ""head node"" for the combine.
Instead we can tree-reduce the combine ([example](https://gist.github.com/dcherian/345c81c69c3587873a89b49c949d1561)) by switching to `dask.bag` instead of `dask.delayed` and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.
1. The downside is the dask graph is ""worse"" but perhaps that shouldn't stop us.
2. I think this is only feasible for `combine=""nested""`
cc @TomNicholas
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8523/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1223031600,I_kwDOAMm_X85I5fsw,6561,Excessive memory consumption by to_dataframe(),8419421,closed,0,,,4,2022-05-02T15:33:33Z,2023-12-15T20:47:32Z,2023-12-15T20:47:32Z,NONE,,,,"### What happened?
This is a reincarnation of #2534 with a reproduceable example.
A 51 MB netCDF file leads to to_dataframe() requesting 23 GB.
### What did you expect to happen?
I expect to_dataframe() to require much less than 23 GB of memory for this operation.
### Minimal Complete Verifiable Example
```Python
import urllib.request
import xarray as xr
url = 'http://people.envsci.rutgers.edu/decker/Surface_METAR_20220501_0000.nc'
fname = 'metar.nc'
urllib.request.urlretrieve(url, filename=fname)
ncdata = xr.open_dataset(fname)
df = ncdata.to_dataframe()
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
Traceback (most recent call last):
File ""/chariton/decker/test/bug/xarraymem.py"", line 8, in
df = ncdata.to_dataframe()
File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5399, in to_dataframe
return self._to_dataframe(ordered_dims=ordered_dims)
File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5363, in _to_dataframe
data = [
File ""/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py"", line 5364, in
self._variables[k].set_dims(ordered_dims).values.reshape(-1)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 23.3 GiB for an array with shape (5021, 127626) and data type |S39
```
### Anything else we need to know?
_No response_
### Environment
/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn(""Setuptools is replacing distutils."")
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.62.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.3
scipy: None
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6561/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
384002323,MDU6SXNzdWUzODQwMDIzMjM=,2570,np.clip() executes eagerly,1200058,closed,0,,,4,2018-11-24T16:25:03Z,2023-12-03T05:29:17Z,2023-12-03T05:29:17Z,NONE,,,,"#### Example:
```python
x = xr.DataArray(np.random.uniform(size=[100, 100])).chunk(10)
x
```
>
> dask.array
> Dimensions without coordinates: dim_0, dim_1
>
```python
np.clip(x, 0, 0.5)
```
>
> array([[0.264276, 0.32227 , 0.336396, ..., 0.110182, 0.28255 , 0.399041],
> [0.5 , 0.030289, 0.5 , ..., 0.428923, 0.262249, 0.5 ],
> [0.5 , 0.5 , 0.280971, ..., 0.427334, 0.026649, 0.5 ],
> ...,
> [0.5 , 0.5 , 0.294943, ..., 0.053143, 0.5 , 0.488239],
> [0.5 , 0.341485, 0.5 , ..., 0.5 , 0.250441, 0.5 ],
> [0.5 , 0.156285, 0.179123, ..., 0.5 , 0.076242, 0.319699]])
> Dimensions without coordinates: dim_0, dim_1
```python
x.clip(0, 0.5)
```
>
> dask.array
> Dimensions without coordinates: dim_0, dim_1
#### Problem description
Using np.clip() directly calculates the result, while xr.DataArray.clip() does not.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2570/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1902108672,I_kwDOAMm_X85xX-AA,8207,Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset`,50383939,open,0,,,4,2023-09-19T02:44:02Z,2023-12-01T22:29:49Z,,NONE,,,,"### What is your issue?
I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow:
```python-console
In [1]: import os; import dask
In [2]: import xarray as xr
In [3]: from dask.distributed import Client, LocalCluster
In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker
In [5]: client = Client(cluster)
In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'
In [7]: ds = xr.open_mfdataset('./remapped/*.nc', chunks={'COMID': 1400}, parallel=True)
In [8]: ds.to_netcdf('./out2.nc')
```
And below, is the error I am getting:
Error message
```python-console
In [8]: ds.to_netcdf('./out2.nc')
/home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
warnings.warn(
2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed
Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0)
Function: getter
args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None)))
kwargs: {}
Exception: ""RuntimeError('NetCDF: HDF error')""
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds.to_netcdf('./out2.nc')
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
2249 encoding = {}
2250 from xarray.backends.api import to_netcdf
-> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
2253 self,
2254 path,
2255 mode=mode,
2256 format=format,
2257 group=group,
2258 engine=engine,
2259 encoding=encoding,
2260 unlimited_dims=unlimited_dims,
2261 compute=compute,
2262 multifile=False,
2263 invalid_netcdf=invalid_netcdf,
2264 )
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1252 if multifile:
1253 return writer, store
-> 1255 writes = writer.sync(compute=compute)
1257 if isinstance(target, BytesIO):
1258 store.sync()
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs)
253 if chunkmanager_store_kwargs is None:
254 chunkmanager_store_kwargs = {}
--> 256 delayed_store = chunkmanager.store(
257 self.sources,
258 self.targets,
259 lock=self.lock,
260 compute=compute,
261 flush=True,
262 regions=self.regions,
263 **chunkmanager_store_kwargs,
264 )
265 self.sources = []
266 self.targets = []
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, **kwargs)
203 def store(
204 self,
205 sources: DaskArray | Sequence[DaskArray],
206 targets: Any,
207 **kwargs,
208 ):
209 from dask.array import store
--> 211 return store(
212 sources=sources,
213 targets=targets,
214 **kwargs,
215 )
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(***failed resolving arguments***)
1234 elif compute:
1235 store_dsk = HighLevelGraph(layers, dependencies)
-> 1236 compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
1237 return None
1239 else:
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs)
367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get)
368 dsk2 = optimization_function(cls)(dsk, keys, **kwargs)
--> 369 return schedule(dsk2, keys, **kwargs)
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
3265 should_rejoin = False
3266 try:
-> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
3268 finally:
3269 for f in futures.values():
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous)
2390 local_worker = None
2392 with shorten_traceback():
-> 2393 return self.sync(
2394 self._gather,
2395 futures,
2396 errors=errors,
2397 direct=direct,
2398 local_worker=local_worker,
2399 asynchronous=asynchronous,
2400 )
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__()
483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
--> 484 return np.asarray(self.get_duck_array(), dtype=dtype)
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array()
486 def get_duck_array(self):
--> 487 return self.array.get_duck_array()
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array()
663 def get_duck_array(self):
--> 664 return self.array.get_duck_array()
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array()
552 # self.array[self.key] is now a numpy array when
553 # self.array is a BackendArray subclass
554 # and self.key is BasicIndexer((slice(None, None, None),))
555 # so we need the explicit check for ExplicitlyIndexed
556 if isinstance(array, ExplicitlyIndexed):
--> 557 array = array.get_duck_array()
558 return _wrap_numpy_scalars(array)
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array()
73 def get_duck_array(self):
---> 74 return self.func(self.array.get_duck_array())
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array()
550 def get_duck_array(self):
--> 551 array = self.array[self.key]
552 # self.array[self.key] is now a numpy array when
553 # self.array is a BackendArray subclass
554 # and self.key is BasicIndexer((slice(None, None, None),))
555 # so we need the explicit check for ExplicitlyIndexed
556 if isinstance(array, ExplicitlyIndexed):
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__()
99 def __getitem__(self, key):
--> 100 return indexing.explicit_indexing_adapter(
101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
102 )
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter()
836 """"""Support explicit indexing by delegating to a raw indexing method.
837
838 Outer and/or vectorized indexers are supported by indexing a second time
(...)
855 Indexing result, in the form of a duck numpy-array.
856 """"""
857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 858 result = raw_indexing_method(raw_key.tuple)
859 if numpy_indices.tuple:
860 # index the loaded np.ndarray
861 result = NumpyIndexingAdapter(result)[numpy_indices]
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem()
110 try:
111 with self.datastore.lock:
--> 112 original_array = self.get_array(needs_lock=False)
113 array = getitem(original_array, key)
114 except IndexError:
115 # Catch IndexError in netCDF4 and return a more informative
116 # error message. This is most often called when an unsorted
117 # indexer is used before the data is loaded from disk.
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array()
90 def get_array(self, needs_lock=True):
---> 91 ds = self.datastore._acquire(needs_lock)
92 variable = ds.variables[self.variable_name]
93 variable.set_auto_maskandscale(False)
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire()
402 def _acquire(self, needs_lock=True):
--> 403 with self._manager.acquire_context(needs_lock) as root:
404 ds = _nc4_require_group(root, self._group, self._mode)
405 return ds
File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__()
133 del self.args, self.kwds, self.func
134 try:
--> 135 return next(self.gen)
136 except StopIteration:
137 raise RuntimeError(""generator didn't yield"") from None
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context()
196 @contextlib.contextmanager
197 def acquire_context(self, needs_lock=True):
198 """"""Context manager for acquiring a file.""""""
--> 199 file, cached = self._acquire_with_cache_info(needs_lock)
200 try:
201 yield file
File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info()
215 kwargs = kwargs.copy()
216 kwargs[""mode""] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
218 if self._mode == ""w"":
219 # ensure file doesn't get overridden when opened again
220 self._mode = ""a""
File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__()
File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars()
File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error
```
The header of individual NetCDF ones are also in the following:
Individual NetCDF header
```console
$ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc
netcdf ab_models_remapped_1980-04-20-13-00-00 {
dimensions:
COMID = 14980 ;
time = UNLIMITED ; // (24 currently)
variables:
int time(time) ;
time:long_name = ""time"" ;
time:units = ""hours since 1980-04-20 12:00:00"" ;
time:calendar = ""gregorian"" ;
time:standard_name = ""time"" ;
time:axis = ""T"" ;
double latitude(COMID) ;
latitude:long_name = ""latitude"" ;
latitude:units = ""degrees_north"" ;
latitude:standard_name = ""latitude"" ;
double longitude(COMID) ;
longitude:long_name = ""longitude"" ;
longitude:units = ""degrees_east"" ;
longitude:standard_name = ""longitude"" ;
double COMID(COMID) ;
COMID:long_name = ""shape ID"" ;
COMID:units = ""1"" ;
double RDRS_v2.1_P_P0_SFC(time, COMID) ;
RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ;
RDRS_v2.1_P_P0_SFC:long_name = ""Forecast: Surface pressure"" ;
RDRS_v2.1_P_P0_SFC:units = ""mb"" ;
double RDRS_v2.1_P_HU_1.5m(time, COMID) ;
RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ;
RDRS_v2.1_P_HU_1.5m:long_name = ""Forecast: Specific humidity"" ;
RDRS_v2.1_P_HU_1.5m:units = ""kg kg**-1"" ;
double RDRS_v2.1_P_TT_1.5m(time, COMID) ;
RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ;
RDRS_v2.1_P_TT_1.5m:long_name = ""Forecast: Air temperature"" ;
RDRS_v2.1_P_TT_1.5m:units = ""deg_C"" ;
double RDRS_v2.1_P_UVC_10m(time, COMID) ;
RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ;
RDRS_v2.1_P_UVC_10m:long_name = ""Forecast: Wind Modulus (derived using UU and VV)"" ;
RDRS_v2.1_P_UVC_10m:units = ""kts"" ;
double RDRS_v2.1_A_PR0_SFC(time, COMID) ;
RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ;
RDRS_v2.1_A_PR0_SFC:long_name = ""Analysis: Quantity of precipitation"" ;
RDRS_v2.1_A_PR0_SFC:units = ""m"" ;
double RDRS_v2.1_P_FB_SFC(time, COMID) ;
RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ;
RDRS_v2.1_P_FB_SFC:long_name = ""Forecast: Downward solar flux"" ;
RDRS_v2.1_P_FB_SFC:units = ""W m**-2"" ;
double RDRS_v2.1_P_FI_SFC(time, COMID) ;
RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ;
RDRS_v2.1_P_FI_SFC:long_name = ""Forecast: Surface incoming infrared flux"" ;
RDRS_v2.1_P_FI_SFC:units = ""W m**-2"" ;
```
I am running `xarray` and `Dask` on an HPC, so the ""modules"" I have loaded are the following:
```console
module list
Currently Loaded Modules:
1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math)
2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys)
3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a
4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5
5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1
```
Any suggestion is greatly appreciated!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8207/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2019789753,I_kwDOAMm_X854Y4u5,8499,'drop_duplicates' behaves differently when using 1 vs many coordinates for an index,6654709,open,0,,,4,2023-12-01T00:36:42Z,2023-12-01T09:55:39Z,,NONE,,,,"### What happened?
I am trying to `drop_duplicates` from a DataArray based on the values of some of the coordinates,
starting from a DataArray with coordinates, but no indexes.
To accomplish this, I call 'DataArray.set_xindex' with the appropriate coordinate names,
and then call 'drop_duplicates' on the resulting DataArray, like so:
```python
from xarray import DataArray
import numpy as np
test_array = DataArray(
np.random.rand(5),
coords=dict(x=(""sample"", [1, 2, 1, 2, 1]), y=(""sample"", [-1] * 5)),
dims=""sample"",
)
# output DataArray's 'sample' dimension has length 2, as expected
good = test_array.set_xindex([""x"", ""y""]).drop_duplicates(""sample"")
assert len(good) == 2
```
The above functions as expected; 'good' has had its duplicates dropped,
and we are left with a DataArray of length 2.
However, the following does _not_ function as I would expect:
```python
# All the 'y's are '-1', so we expect the same duplicates as before to be dropped,
# even if we don't include the 'y' values in the index.
bad = test_array.set_xindex(""x"").drop_duplicates(""sample"")
# But this assert fails! 'drop_duplicates' does not drop anything
assert not bad.equals(test_array)
```
### What did you expect to happen?
I expected `drop_duplicates` to drop the duplicates when I was using only a single coordinate for the index.
### Minimal Complete Verifiable Example
```Python
from xarray import DataArray
import numpy as np
test_array = DataArray(
range(5),
coords=dict(x=(""sample"", [1, 2, 1, 2, 1]), y=(""sample"", [-1] * 5)),
dims=""sample"",
)
# output DataArray's 'sample' dimension has length 2, as expected
good = test_array.set_xindex([""x"", ""y""]).drop_duplicates(""sample"")
# And indeed there are only 2 elements left after dropping duplicates.
assert len(good) == 2
# All the 'y's are '-1', so we expect the same duplicates as before to be dropped,
bad = test_array.drop_vars(""y"").set_xindex(""x"").drop_duplicates(""sample"")
# But this assert fails! 'drop_duplicates' does not drop anything
assert not bad.equals(test_array.drop_vars(""y""))
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.11.0
pandas: 2.1.0
numpy: 1.24.4
scipy: 1.11.2
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.2.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.9.1
distributed: 2023.9.1
matplotlib: 3.7.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.7.3
pytest: 7.4.2
mypy: None
IPython: 8.15.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8499/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1983891070,I_kwDOAMm_X852P8Z-,8427,Ambiguous behavior with coordinates when appending to Zarr store with append_dim,1197350,closed,0,,,4,2023-11-08T15:40:19Z,2023-12-01T03:58:56Z,2023-12-01T03:58:55Z,MEMBER,,,,"### What happened?
There are two quite different scenarios covered by ""append"" with Zarr
- Adding new variables to a dataset
- Extending arrays along a dimensions (via `append_dim`)
This issue is about what should happen when using `append_dim` with variables that _do not contain `append_dim`_.
Here's the current behavior.
```python
import xarray as xr
import zarr
ds1 = xr.DataArray(
np.array([1, 2, 3]).reshape(3, 1, 1),
dims=('time', 'y', 'x'),
coords={'x': [1], 'y': [2]},
name=""foo""
).to_dataset()
ds2 = xr.DataArray(
np.array([4, 5]).reshape(2, 1, 1),
dims=('time', 'y', 'x'),
coords={'x':[-1], 'y': [-2]},
name=""foo""
).to_dataset()
# how concat works: data are aligned
ds_concat = xr.concat([ds1, ds2], dim=""time"")
assert ds_concat.dims == {""time"": 5, ""y"": 2, ""x"": 2}
# now do a Zarr append
store = zarr.storage.MemoryStore()
ds1.to_zarr(store, consolidated=False)
# we do not check that the coordinates are aligned--just that they have the same shape and dtype
ds2.to_zarr(store, append_dim=""time"", consolidated=False)
ds_append = xr.open_zarr(store, consolidated=False)
# coordinates data have been overwritten
assert ds_append.dims == {""time"": 5, ""y"": 1, ""x"": 1}
# ...with the latest values
assert ds_append.x.data[0] == -1
```
Currently, we _always write all data variables in this scenario_. That includes overwriting the coordinates every time we append. That makes appending more expensive than it needs to be. I don't think that is the behavior most users want or expect.
### What did you expect to happen?
There are a couple of different options we could consider for how to handle this ""extending"" situation (with `append_dim`)
1. [current behavior] Do not attempt to align coordinates
a. [current behavior] Overwrite coordinates with new data
b. Keep original coordinates
c. Force the user to explicitly drop the coordinates, as we do for `region` operations.
2. Attempt to align coordinates
a. Fail if coordinates don't match
b. Extend the arrays to replicate the behavior of `concat`
We currently do 1a. **I propose to switch to 1b**. I think it is closer to what users want, and it requires less I/O.
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.176-157.645.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.1
pandas: 2.1.2
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.2.0
h5py: 3.10.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.1
distributed: 2023.10.1
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: 0.13.0
numbagg: 0.6.0
fsspec: 2023.10.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.16.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8427/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1044693438,I_kwDOAMm_X84-RMG-,5937,DataArray.dt.seconds returns incorrect value for negative `timedelta64[ns]`,2405019,closed,0,,,4,2021-11-04T12:05:24Z,2023-11-10T00:39:17Z,2023-11-10T00:39:17Z,CONTRIBUTOR,,,,"**What happened**:
For a negative `timedelta64[ns]` of 42 nanoseconds `DataArray.dt.seconds` returned a non-zero value (the returned value was `86399`). When I pass in a positive 42 nanosecond `timedelta64[ns]` with the the TimeDeltaAccessor correctly returns zero. I would have expected both assertions in the example below to have passed, but the second fails. This seems to be a general issue with negative `timedelta64[ns]`.
```bash
array([0])
Dimensions without coordinates: dim_0
array([86399])
Dimensions without coordinates: dim_0
Traceback (most recent call last):
File ""bug_dt_seconds.py"", line 15, in
assert da.dt.seconds == 0
AssertionError
```
**What you expected to happen**:
```bash
array([0])
Dimensions without coordinates: dim_0
array([0])
Dimensions without coordinates: dim_0
```
**Minimal Complete Verifiable Example**:
```python
# coding: utf-8
import xarray as xr
import numpy as np
# number of nanoseconds
value = 42
da = xr.DataArray([np.timedelta64(value, ""ns"")])
print(da.dt.seconds)
assert da.dt.seconds == 0
da = xr.DataArray([np.timedelta64(-value, ""ns"")])
print(da.dt.seconds)
assert da.dt.seconds == 0
```
**Anything else we need to know?**:
I've narrowed this down to the call to `pd.Series(values.ravel())` in `xarray.core.accessor_dt._access_through_series`:
```python
ipdb> pd.Series(values.ravel())
0 -1 days +23:59:59.999999958
dtype: timedelta64[ns]
```
I think the issue arises because pandas turns the numpy timedelta64 into a ""minus one day plus a time"". This actually does have a number of ""seconds"" in it, but the ""total_seconds"" has the expected value:
```python
ipdb> pd.Series(values.ravel()).dt.total_seconds()
0 -4.200000e-08
dtype: float64
```
Which would correctly round to zero.
I don't think the issue is in pandas, although the output from pandas is counter-intuitive:
```python
ipdb> pd.Series(values.ravel()).dt.seconds
0 86399
dtype: int64
```
Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)?
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, May 6 2020, 04:59:01)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.18.2
pandas: 1.3.4
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.4.2
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.09.1
distributed: 2021.09.1
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
fsspec: 2021.06.1
cupy: None
pint: 0.18
sparse: None
setuptools: 46.4.0.post20200518
pip: 21.1.2
conda: None
pytest: 6.0.1
IPython: 7.16.1
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5937/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1981799811,I_kwDOAMm_X852H92D,8423,Support remote string paths for `h5netcdf` engine,11656932,open,0,,,4,2023-11-07T16:52:18Z,2023-11-09T07:24:45Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem?
Currently the `h5netcdf` engine supports opening remote files, but only already open file-like objects (e.g. `s3fs.open(...)`), not string paths like `s3://...`. There are situations where I'd like to use string paths instead of open file-like objets
- Opening files can sometimes be slow (xref https://github.com/fsspec/s3fs/issues/816)
- When using `parallel=True` for opening lots of files, serializing open file-like objects back and forth from a remote cluster can be slow
- Some systems (e.g. NASA Earthdata) only hand out credentials that are valid when run in the same region as the data. Being able to use `parallel=True` + `storage_options` would be convenient/performant in that case.
### Describe the solution you'd like
It would be nice if I could do something like the following:
```python
ds = xr.open_mfdataset(
files, # A bunch of files like `s3://bucket/file`
engine=""h5netcdf"",
...
parallel=True,
storage_options={...}, # fsspec-compatible options
)
```
and have my files opened prior to handing off to `h5netcdf`. `storage_options` is already supported for Zarr, so hopefully extending to `h5netcdf` feels natural.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8423/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1975845455,I_kwDOAMm_X851xQJP,8410,Segmentation fault 139 (SIGSEGV),39524075,closed,0,,,4,2023-11-03T10:14:03Z,2023-11-06T20:34:46Z,2023-11-06T20:34:45Z,NONE,,,,"### What happened?
While opening a set of netCDF files in a for loop, using xr.open_dataset().load(), I get a segmentation error (nr. 139). Please see code example below:
```
for region in region_list:
[some code to read data associated to each region...]
region_pred = xr.open_dataset(io.BytesIO(data)).load()
[other code working on region_pred...]
```
The error is shown in Linux/Mac after running my Python code, whereas Windows seems to be masking it. I was able to catch that on Windows by launching my code as:
```
python3 my_code.py && echo ok || echo KO
```
In this way, KO gets printed and the segmentation fault is now noticeable.
I managed to fix the issue by using a second variable (called reg_pred) in addition to region_pred:
```
for region in region_list:
[some code to read data associated to each region...]
region_pred = xr.open_dataset(io.BytesIO(data))
reg_pred = region_pred.load()
[other code working on reg_pred...]
```
### What did you expect to happen?
I don't know if the issue I described is something that the developers made on purpose. Personally, I think it is an issue and that's why I am reporting it. If it is not an issue, I would like to get a clarification in order to understand what am I missing.
Thank you in advance.
### Minimal Complete Verifiable Example
```Python
for region in region_list:
with storage_client.open(region, ""rb"") as f:
data = f.read()
region_pred = xr.open_dataset(io.BytesIO(data)).load()
# some code working on region_pred to compute weather indices...
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: ('Italian_Italy', '1252')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2023.8.0
pandas: 2.1.0
numpy: 1.26.0
scipy: 1.11.2
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.15.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8410/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1977485456,I_kwDOAMm_X8513giQ,8413,Add a perception of a __xarray__ magic method ,6273919,open,0,,,4,2023-11-04T19:55:14Z,2023-11-05T18:50:14Z,,NONE,,,,"### Is your feature request related to a problem?
I am often moving data from external objects (of all sorts!) into xarray. This is a common use case
Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into
### Describe the solution you'd like
So here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best.
# Magic Methods
It would be great to see these magic method signatures become integrated throughout the library:
```
___xarray__ -> xr.Dataset | xr.DataArray
___xarray_array__ -> xr.DatArray
___xarray_dataset__ -> xr.Dataset
___xarray_datatree__ -> xr.DataTree # when DataTree is finally integrated into xarray
```
# Conversion Registry
And these extension functions to register converters:
```
def register_xarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset | xr.DataArray]:
...
def register_dataarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.DataArray:
...
def register_dataset_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset:
...
def register_datatree_converter(class, name: str, func : Callable[[class, ...], xr.DataArray] | None) -> DataTree # when DataTree is finally integrated into xarray
...
```
Registering a converter if if cls implements a corresponding __xarray_*__ method or another converter already registered for cls. Perhaps add an argument that specifies if the converter should or should not be added if their is a clash. Perhaps these functions return the replaced converter so it can be added back in if needed?
Ideally, also, ""deregister"" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed.
# User API
Along with the following new user API functions:
```
def as_xarray(x, *args, **kwargs) -> xr.Dataset | xr.DataArray:
...
def as_dataarray(x,*args, **kwargs) -> xr.DataArray:
...
def as_dataset(x,*args, **kwargs) -> xr.DataSet:
...
def as_dataset(x,*args, **kwargs) -> xr.DataSet: # when DataTree is finally integrated into xarray
...
```
""as_xarray"" returns (in order of precedence:
- x unaltered if it is an xarray objects
- registered_xarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception
- registered_dataarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception
- registered_dataarray_converter(x, *args, **kwargs) if it is callable and does not throw an exception
- x.__xarray__(*args, **kwargs), if it exits, is callable, and does not throw an exception
- x.__xarray_dataset__(*args, **kwargs), if it exists, is callable, and does not throw an exception
- x.__xarray_dataarray__(*args, **kwargs), if it exists, is callable, and does not throw an exception
- well known aliases of __xarray_dataarray__, such as x.to_xarray(*args, **kwargs) (see pandas)
- [DESIGN DECISION] convert and return tuple[dims, data, [attr, encoding] to DataArray?
- [DESIGN DECISION] convert and return tuple encoding of DataSet?
- [DESIGN DECISION] return DataArray wrapped duck-typed array in DataArray?
The rationale for putting the registered functions first is that this would enable
""as_dataarrray"" would be slimilar, but it would only call x.__xarray_dataarray__ and well known aliases.
""as_dataset"" would be slimilar, but it would only call x.__xarray_dataset__, well known aliases, and perhaps falling back to calling x.__xarray_dataarray__ and converting the return a dataset if it has a name attribute.
""as_datatree"" would be slimilar, but it would only call x.__xarray_datatree__, and perhaps falling back to calling x.__xarray_dataarray__ and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray)
The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry.
# Across the Xarray Library
Finally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library.
Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_* functions.
### Describe alternatives you've considered
This can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases.
Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of _repr_html_ to integrate with pandas.
The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible:
```
x = xr.open_dataset(""slide.tiff"")
```
But this doesn't work.
```
t = tiffslide.TiffSlide(""slide.tiff"")
x = xr.open_dataset(t) # won't work
x = xr.DataArray(t) # won't work either
```
This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem.
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8413/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
887711474,MDU6SXNzdWU4ODc3MTE0NzQ=,5290,Inconclusive error messages using to_zarr with regions,5802846,closed,0,,,4,2021-05-11T15:54:39Z,2023-11-05T06:28:39Z,2023-11-05T06:28:39Z,CONTRIBUTOR,,,,"
**What happened**:
The idea is to use a xarray dataset (stored as dummy zarr file), which is subsequently filled with the `region` argument, as explained in the documentation. Ideally, almost nothing is stored to disk upfront.
It seems the current implementation is only designed to either store coordinates for the whole dataset and write them to disk or to write without coordinates. I failed to understand this from the documentation and tried to create a dataset without coordinates and fill it with a dataset subset with coordinates. It gave some inconclusive errors depending on the actual code example (see below).
`ValueError: parameter 'value': expected array with shape (0,), got (10,)` or `ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo'`
It might also be a bug and it should in fact be possible to add a dataset with coordinates to a dummy dataset without coordinates. Then there seems to be an issue regarding the handling of the variables during storing the region.
... or I might just have done it wrong... and I'm looking forward to suggestions.
**What you expected to happen**:
Either an error message telling me that that i should use coordinates during creation of the dummy dataset. Alternatively, if this is a bug and should be possible then it should just work.
**Minimal Complete Verifiable Example**:
```python
import dask.array
import xarray as xr
import numpy as np
error = 1 # choose between 0 (no error), 1, 2, 3
dummies = dask.array.zeros(30, chunks=10)
# chunks in coords are not taken into account while saving!?
coord_x = dask.array.zeros(30, chunks=10) # or coord_x = np.zeros((30,))
if error == 0:
ds = xr.Dataset({""foo"": (""x"", dummies)}, coords={""x"":coord_x})
else:
ds = xr.Dataset({""foo"": (""x"", dummies)})
print(ds)
path = ""./tmp/test.zarr""
ds.to_zarr(path, mode='w', compute=False, consolidated=True)
# create a new dataset to be input into a region
ds = xr.Dataset({""foo"": ('x', np.arange(10))},coords={""x"":np.arange(10)})
if error == 1:
ds.to_zarr(path, region={""x"": slice(10, 20)})
# ValueError: parameter 'value': expected array with shape (0,), got (10,)
elif error == 2:
ds.to_zarr(path, region={""x"": slice(0, 10)})
ds.to_zarr(path, region={""x"": slice(10, 20)})
# ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo'
elif error == 3:
ds.to_zarr(path, region={""x"": slice(0, 10)})
ds = xr.Dataset({""foo"": ('x', np.arange(10))},coords={""x"":np.arange(10)})
ds.to_zarr(path, region={""x"": slice(10, 20)})
# ValueError: parameter 'value': expected array with shape (0,), got (10,)
else:
ds.to_zarr(path, region={""x"": slice(10, 20)})
ds = xr.open_zarr(path)
print('reopen',ds['x'])
```
**Anything else we need to know?**:
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-16-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.18.0
pandas: 1.2.3
numpy: 1.19.2
scipy: 1.6.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.04.0
distributed: None
matplotlib: 3.4.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
377356113,MDU6SXNzdWUzNzczNTYxMTM=,2542,"full_like, ones_like, zeros_like should retain subclasses",500246,closed,0,,,4,2018-11-05T11:22:49Z,2023-11-05T06:27:31Z,2023-11-05T06:27:31Z,CONTRIBUTOR,,,,"#### Code Sample,
```python
# Your code here
import numpy
import xarray
class MyDataArray(xarray.DataArray):
pass
da = MyDataArray(numpy.arange(5))
da2 = xarray.zeros_like(da)
print(type(da), type(da2))
```
#### Problem description
I would expect that `type(da2) is type(da)`, but this is not the case. The type of `da` is always ``. Rather, the output of this script is:
``````
#### Expected Output
I would hope as an output:
``````
In principle changing this could break people's code, so if a change is implemented it should probably be through an optional keyword argument to the `full_like`/`ones_like`/`zeros_like` family.
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-754.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
xarray: 0.10.7
pandas: 0.23.2
numpy: 1.15.2
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 3.0.0
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 39.2.0
pip: 18.0
conda: None
pytest: 3.2.2
IPython: 6.4.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2542/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1966675016,I_kwDOAMm_X851ORRI,8388,Type annotation compatibility with numpy ufuncs,1828519,closed,0,,,4,2023-10-28T17:25:11Z,2023-11-02T12:44:50Z,2023-11-02T12:44:50Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem?
I'd like mypy to understand that xarray DataArrays passed to numpy ufuncs have a return type of xarray DataArray.
```python
import xarray as xr
import numpy as np
def compute_relative_azimuth(sat_azi: xr.DataArray, sun_azi: xr.DataArray) -> xr.DataArray:
abs_diff = np.absolute(sun_azi - sat_azi)
ssadiff = np.minimum(abs_diff, 360 - abs_diff)
return ssadiff
```
```bash
$ mypy ./xarray_mypy.py
xarray_mypy.py:7: error: Incompatible return value type (got ""ndarray[Any, dtype[Any]]"", expected ""DataArray"") [return-value]
Found 1 error in 1 file (checked 1 source file)
```
### Describe the solution you'd like
I'm not sure if this is possible, if it is something xarray can fix, or something numpy needs to ""fix"". I'd like the above situation to ""just work"" without anything more than maybe some extra type-stub package.
### Describe alternatives you've considered
Cast types or other type coercion or tell mypy to ignore the type issues for these numpy call.
### Additional context
https://stackoverflow.com/questions/77369042/typing-when-passing-xarray-dataarray-objects-to-numpy-ufuncs","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8388/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1445905299,I_kwDOAMm_X85WLsOT,7282,groupby and mean on a MultiIndex level raises ValueError,25231875,closed,0,,,4,2022-11-11T19:15:58Z,2023-10-30T09:18:54Z,2023-08-31T03:50:33Z,NONE,,,,"### What happened?
After using `set_index` to create a `MultiIndex`, calling `groupby` on a `MultiIndex` level and then `mean` raises an error.
### What did you expect to happen?
Apply mean to groups, no error.
### Minimal Complete Verifiable Example
```Python
d = DataArray(
data=[
[0, 1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20]
],
coords={
""greek"": (""a"", ['alpha', 'beta', 'gamma']),
""colors"": (""a"", ['red', 'green', 'blue']),
""compass"": (""b"", ['north', 'south', 'east', 'west', 'northeast', 'southeast', 'southwest']),
""integer"": (""b"", [0, 1, 2, 3, 4, 5, 6]),
},
dims=(""a"", ""b"")
)
d = d.set_index(a=['greek', 'colors'], b=['compass', 'integer'])
g = d.groupby('greek')
m = g.mean(...)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
Traceback (most recent call last):
File """", line 1, in
File ""/usr/local/lib/python3.10/site-packages/xarray/core/_aggregations.py"", line 5698, in mean
return self.reduce(
File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1201, in reduce
return self.map(reduce_array, shortcut=shortcut)
File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1104, in map
return self._combine(applied, shortcut=shortcut)
File ""/usr/local/lib/python3.10/site-packages/xarray/core/groupby.py"", line 1136, in _combine
index, index_vars = create_default_index_implicit(coord)
File ""/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py"", line 1045, in create_default_index_implicit
index = PandasMultiIndex(array, name)
File ""/usr/local/lib/python3.10/site-packages/xarray/core/indexes.py"", line 615, in __init__
raise ValueError(
ValueError: conflicting multi-index level name 'greek' with dimension 'greek'
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110]
python-bits: 64
OS: Linux
OS-release: 5.15.49-linuxkit
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.11.0
pandas: 1.5.1
numpy: 1.23.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.2.2
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7282/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1953059418,I_kwDOAMm_X850aVJa,8345,`.stack` produces large chunks,40218891,closed,0,,,4,2023-10-19T21:09:56Z,2023-10-26T21:20:05Z,2023-10-26T21:20:05Z,NONE,,,,"### What happened?
Xarray ``stack`` does not chunk along the last coordinate, producing huge chunks, as described in #5754. Dask, seeing code like this:
```
da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
```
produces warning and suggestion to use context manager:
```
with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
```
This fails with message ``IndexError: tuple index out of range``.
### What did you expect to happen?
I expect this to work. #5754 is closed.
### Minimal Complete Verifiable Example
```Python
import dask.array
import numpy as np
import xarray as xr
var = xr.Variable(
(""t"", ""z"", ""u"", ""x"", ""y""),
dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)),
)
da = xr.DataArray(var)
def sum(ds):
return ds.sum(dim=""u"")
with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
da2
```
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
```Python
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[21], line 5
2 return ds.sum(dim=""u"")
4 with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
----> 5 da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
6 da2
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse)
2795 def unstack(
2796 self,
2797 dim: Dims = None,
2798 fill_value: Any = dtypes.NA,
2799 sparse: bool = False,
2800 ) -> Self:
2801 """"""
2802 Unstack existing dimensions corresponding to MultiIndexes into
2803 multiple new dimensions.
(...)
2853 DataArray.stack
2854 """"""
-> 2855 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2856 return self._from_temp_dataset(ds)
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse)
5498 for d in dims:
5499 if needs_full_reindex:
-> 5500 result = result._unstack_full_reindex(
5501 d, stacked_indexes[d], fill_value, sparse
5502 )
5503 else:
5504 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
5393 if name not in index_vars:
5394 if dim in var.dims:
-> 5395 variables[name] = var.unstack({dim: new_dim_sizes})
5396 else:
5397 variables[name] = var
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, **dimensions_kwargs)
1928 result = self
1929 for old_dim, dims in dimensions.items():
-> 1930 result = result._unstack_once_full(dims, old_dim)
1931 return result
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim)
1817 reordered = self.transpose(*dim_order)
1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes
-> 1820 new_data = reordered.data.reshape(new_shape)
1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names
1823 return type(self)(
1824 new_dims, new_data, self._attrs, self._encoding, fastpath=True
1825 )
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape)
2217 if len(shape) == 1 and not isinstance(shape[0], Number):
2218 shape = shape[0]
-> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit)
283 else:
284 chunk_plan.append(""auto"")
--> 285 outchunks = normalize_chunks(
286 chunk_plan,
287 shape=shape,
288 limit=limit,
289 dtype=x.dtype,
290 previous_chunks=inchunks,
291 )
293 x2 = x.rechunk(inchunks)
295 # Construct graph
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks)
3092 chunks = tuple(""auto"" if isinstance(c, str) and c != ""auto"" else c for c in chunks)
3094 if any(c == ""auto"" for c in chunks):
-> 3095 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
3097 if shape is not None:
3098 chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape))
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks)
3212 largest_block = math.prod(
3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto""
3214 )
3216 if previous_chunks:
3217 # Base ideal ratio on the median chunk size of the previous chunks
-> 3218 result = {a: np.median(previous_chunks[a]) for a in autos}
3220 ideal_shape = []
3221 for i, s in enumerate(shape):
File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in (.0)
3212 largest_block = math.prod(
3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto""
3214 )
3216 if previous_chunks:
3217 # Base ideal ratio on the median chunk size of the previous chunks
-> 3218 result = {a: np.median(previous_chunks[a]) for a in autos}
3220 ideal_shape = []
3221 for i, s in enumerate(shape):
IndexError: tuple index out of range
```
### Anything else we need to know?
The most recent traceback entry point to an issue in dask code.
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8345/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1923431725,I_kwDOAMm_X85ypT0t,8264,Improve error messages,5635139,open,0,,,4,2023-10-03T06:42:57Z,2023-10-24T18:40:04Z,,MEMBER,,,,"### Is your feature request related to a problem?
Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline.
Some of the error messages could be _much_ more helpful. Take one example:
```
xarray.core.merge.MergeError: conflicting values for variable 'date' on objects to be combined.
You can skip this check by specifying compat='override'.
```
The second sentence is nice. But the first could be give us much more information:
- Which variables conflict? I'm merging four objects, so would be so helpful to know which are causing the issue.
- What is the conflict? Is one a superset and I can `join=...`? Are they off by 1 or are they completely different types?
- Our `testing.assert_equal` produces pretty nice errors, as a comparison
Having these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library.
### Describe the solution you'd like
I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date.
One thing we do in PRQL is have a file that snapshots error messages [`test_bad_error_messages.rs`](https://github.com/PRQL/prql/blob/587aa6ec0e2da0181103bc5045cc5dfa43708827/crates/prql-compiler/src/tests/test_bad_error_messages.rs), which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter, `pytest-regtest` is the best I've found; I wrote `pytest-accept` but requires doctests).
Any other ideas?
### Describe alternatives you've considered
_No response_
### Additional context
A couple of specific error-message issues:
- https://github.com/pydata/xarray/issues/2078
- https://github.com/pydata/xarray/issues/5290","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8264/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
529644880,MDU6SXNzdWU1Mjk2NDQ4ODA=,3580,xr.DataArray.values fails with latest versions of netcdf4,16332933,closed,0,,,4,2019-11-28T01:26:07Z,2023-10-18T17:01:17Z,2023-10-18T17:01:17Z,NONE,,,,"#### MCVE Code Sample
```python
import xarray as xr
xr.show_versions()
url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/NCEP-CFSv2/.HINDCAST/.MONTHLY/.sst/dods'
fullda = xr.open_dataset(url, decode_times=False,chunks={'S': 'auto', 'L': 'auto', 'M':'auto','X':'auto','Y':'auto'})
print(fullda)
print(fullda['sst'][:10,0,0,0,0].values)
```
#### Expected Output
```python
Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181)
Coordinates:
* X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
* L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
* S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0
* M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0
* Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
sst (S, L, M, Y, X) float32 dask.array
Attributes:
Conventions: IRIDL
[-25.652588 -35.577393 -48.702896 -51.3853 -50.687195 -50.341995
-50.407593 -54.955994 -52.052994 -47.31279 ]
```
#### Problem Description
This should return the array’s data as a numpy.ndarray according to the documentation and as shown above. I tested this with various versions of netcdf4 and I get the error below for netcdf4 versions 1.5.1, 1.5.1.2, 1.5.3 (latest version). If I use netcdf4 version 1.5.1, I get the expected output as above.
``` python
Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181)
Coordinates:
* X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
* L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
* S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0
* M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0
* Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
sst (S, L, M, Y, X) float32 dask.array
Attributes:
Conventions: IRIDL
Traceback (most recent call last):
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 84, in _getitem
array = getitem(original_array, key)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/common.py"", line 54, in robust_getitem
return array[key]
File ""netCDF4/_netCDF4.pyx"", line 4408, in netCDF4._netCDF4.Variable.__getitem__
File ""netCDF4/_netCDF4.pyx"", line 5350, in netCDF4._netCDF4.Variable._get
IndexError: index exceeds dimension bounds
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ""testpython.py"", line 7, in
print(fullda['sst'][:10,0,0,0,0].values)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/dataarray.py"", line 567, in values
return self.variable.values
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py"", line 448, in values
return _as_array_or_item(self._data)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py"", line 254, in _as_array_or_item
data = np.asarray(data)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py"", line 1314, in __array__
x = self.compute()
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py"", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py"", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/threaded.py"", line 81, in get
**kwargs
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 486, in get_async
raise_exception(exc, tb)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 316, in reraise
raise exc
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py"", line 222, in execute_task
result = _execute_task(task, data)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/core.py"", line 119, in _execute_task
return func(*args2)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py"", line 106, in getter
c = np.asarray(c)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 481, in __array__
return np.asarray(self.array, dtype=dtype)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 643, in __array__
return np.asarray(self.array, dtype=dtype)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 547, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 72, in __getitem__
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py"", line 827, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File ""/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 94, in _getitem
raise IndexError(msg)
IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().
```
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.4.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.1
xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.8.1
distributed: 2.8.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 42.0.1.post20191125
pip: 19.3.1
conda: None
pytest: None
IPython: None
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3580/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1924497392,I_kwDOAMm_X85ytX_w,8269,open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0',6819509,closed,0,,,4,2023-10-03T16:19:54Z,2023-10-18T16:50:20Z,2023-10-18T16:50:20Z,NONE,,,,"### What is your issue?
When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units ""days accumulated"", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?).
Open the zarr:
```python
import xarray as xr
ds = xr.open_dataset('debug.zarr', engine='zarr', chunks={})
```
Print as a pandas-like table for each version of xarray for readability:
```python
ds.to_dataframe()
```
Version '2023.8.0':
|time|dapr (dtype=float32)|mdpr (dtype=float32)|
|---|---|---|
|2000-01-01|NaN|NaN|
|2000-01-02|NaN|NaN|
|2000-01-03|2.0|1.5|
Version '2023.9.0':
|time|dapr (dtype=float64)|mdpr (dtype=float32)|
|---|---|---|
|2000-01-01|-9.223372e+18|NaN|
|2000-01-02|-9.223372e+18|NaN|
|2000-01-03|2.000000e+00|1.5|
I can manually disable this by using the ""use_cf=False"", ""mask_and_scale=False"", and then manually scale this variable, though that is not ideal. The ""decode_timedelta"" doesn't seem to have an effect on this data, either.
I understand the ""days"" keyword is in my units, however the full unit is ""days accumulated"". Has the behavior of xarray changed to find keywords such as ""days"" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help.
### Code to create the debug.zarr for the tables above:
```python
import numpy as np
import pandas as pd
import xarray as xr
import zarr
# Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)
# mdpr is the amount of a multiday total (inches)
# dapr is the number of days each multiday total occurred over (days accumulated).
# In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03
# I use float32 to represent these, but pack these as int16 values in the zarr.
mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32)
dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32)
time = pd.date_range('2000-01-01', periods=3)
# Create a dataset from these values
ds = xr.Dataset(
data_vars=dict(
mdpr=(['time'], mdpr),
dapr=(['time'], dapr),
),
coords=dict(
time=time,
),
attrs=dict(description='multiday precipitation data'),
)
# Specify encoding to pack these float32 values as int16
encoding = {
'mdpr' : {
'chunks' : (3,),
'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1),
'filters': None,
'missing_value': -32768,
'_FillValue': -32768,
'scale_factor': 0.01,
'add_offset': 0.0,
'dtype': np.int16,
},
'dapr' : {
'chunks' : (3,),
'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1),
'filters': None,
'missing_value': -32768,
'_FillValue': -32768,
'scale_factor': 1.0,
'add_offset': 0.0,
'dtype': np.int16,
},
}
# Create attributes. The ""units"" for the dapr variable seems to be the issue ""days"" in the
# ""days accumulated""
ds.mdpr.attrs['units'] = 'inches'
ds.mdpr.attrs['description'] = 'multiday precip amount'
ds.dapr.attrs['units'] = 'days accumulated'
ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation'
# Save to zarr
ds.to_zarr('debug.zarr', mode='w', encoding=encoding)
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8269/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1384226112,I_kwDOAMm_X85SgZ1A,7075,Convert xarray dataset to pandas dataframe is much slower in newest xarray version,20794996,closed,0,,,4,2022-09-23T19:36:28Z,2023-10-14T20:37:40Z,2023-10-14T20:37:40Z,NONE,,,,"### What is your issue?
Converting an xarray dataset to pandas dataframe has become much slower in the newest xarray version.
I want to read in very large netcdf files, extract a slice, and convert the slice to a pandas dataframe. For an input size of 2GB, the xarray version 0.21.0 takes 3 seconds versus the xarray version 2022.6.0 takes 44 seconds. See table below for more tests with increasing size of xarray dataset.
Number of NetCDF Input Files in Xarray Dataset (~1GB per file): | 2 | 5 | 10 | 15 | 20 | 30 | 40
-- | -- | -- | -- | -- | -- | -- | --
Older Xarray Version 0.21.0 | 0:03 | 0:02 | 0:04 | 0:06 | 0:09 | 0:13 | 0:17
Newer Xarray Version 2022.6.0 | 0:44 | 1:30 | 2:46 | 4:01 | 5:23 | 7:56 | 10:29
Here is my code:
```
# Read in a list of netcdf files and combine into a single dataset.
with xr.open_mfdataset(infile_list, combine='by_coords') as ds:
# Extract the data for a single location (the nearest grid point) using the provided coordinates (lat/lon).
ds_slice = ds.sel(lon=-84.725, lat=42.3583, method='nearest')
# Convert xarray dataset to a pandas dataframe.
# This is now the slow part since the xarray library was updated.
df = ds_slice.to_dataframe()
```
The netcdf files I am reading in are about 1 GB each, containing daily weather data for the entire CONUS. There is 1 file per year, so if I read in 2 files, the dimensions are (lon: 1386, lat: 585, day: 731, crs: 1) with coordinates of lon, lat, day, and crs. They include 8 float data variables.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7075/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1943355490,I_kwDOAMm_X85z1UBi,8308,Different plotting reaults compared to matplotlib,30388627,closed,0,,,4,2023-10-14T15:54:32Z,2023-10-14T20:02:16Z,2023-10-14T20:02:16Z,NONE,,,,"### What happened?
I got different results when I tried to plot 2D data [test.npy.zip](https://github.com/pydata/xarray/files/12906635/test.npy.zip) using matplotlib and xarray.
### matplotlib

### xarray

### What did you expect to happen?
Same plot.
### Minimal Complete Verifiable Example
```Python
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
test = np.load('test.npy')
plt.imshow(test, vmin=0, vmax=200)
plt.colorbar()
xr.DataArray(test).plot.imshow(vmin=0, vmax=200)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 22.3.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.26.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8308/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1821467933,I_kwDOAMm_X85skWUd,8021,Specify chunks in bytes,306380,open,0,,,4,2023-07-26T02:29:43Z,2023-10-06T10:09:33Z,,MEMBER,,,,"### Is your feature request related to a problem?
I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an `open_zarr` call and then provide the right `chunks=` argument. I'll admit though that I wouldn't mind giving Xarray a value like `""1 GiB""` though and having it use that when determining `""auto""` chunk sizes.
Dask array does this in two ways. We can provide a value in chunks as like the following:
```python
x = da.random.random(..., chunks=""1 GiB"")
```
We also refer to a value in Dask config
```python
In [1]: import dask
In [2]: dask.config.get(""array.chunk-size"")
Out[2]: '128MiB'
```
This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8021/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1169750048,I_kwDOAMm_X85FuPgg,6360,Multidimensional `interpolate_na()`,5797727,open,0,,,4,2022-03-15T14:27:46Z,2023-09-28T11:51:20Z,,NONE,,,,"### Is your feature request related to a problem?
I think that having a way to run a multidimensional interpolation for filling missing values would be awesome.
The code snippet below create a data and show the problem I am having now. If the data has some orientation, we couldn't simply interpolate dimensions separately.
```python
import xarray as xr
import numpy as np
n = 30
x = xr.DataArray(np.linspace(0,2*np.pi,n),dims=['x'])
y = xr.DataArray(np.linspace(0,2*np.pi,n),dims=['y'])
z = (np.sin(x)*xr.ones_like(y))
mask = xr.DataArray(np.random.randint(0,1+1,(n,n)).astype('bool'),dims=['x','y'])
kw = dict(add_colorbar=False)
fig,ax = plt.subplots(1,3,figsize=(11,3))
z.plot(ax=ax[0],**kw)
z.where(mask).plot(ax=ax[1],**kw)
z.where(mask).interpolate_na('x').plot(ax=ax[2],**kw)
```

I tried to use advanced interpolation for that, but it doesn't look like the best solution.
```python
zs = z.where(mask).stack(k=['x','y'])
zs = zs.where(np.isnan(zs),drop=True)
xi,yi = zs.k.x.drop('k'),zs.k.y.drop('k')
zi = z.interp(x=xi,y=yi)
fig,ax = plt.subplots()
z.where(mask).plot(ax=ax,**kw)
ax.scatter(xi,yi,c=zi,**kw,linewidth=1,edgecolor='k')
```
returns

### Describe the solution you'd like
Simply `z.interpolate_na(['x','y'])`
### Describe alternatives you've considered
I could extract the data to `numpy` and interpolate using `scipy.interpolate.griddata`, but this is not the way `xarray` should work.
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6360/reactions"", ""total_count"": 11, ""+1"": 9, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,,13221727,issue
1905824568,I_kwDOAMm_X85xmJM4,8221,Frequent doc build timeout / OOM,5635139,open,0,,,4,2023-09-20T23:02:37Z,2023-09-21T03:50:07Z,,MEMBER,,,,"### What is your issue?
I'm frequently seeing `Command killed due to timeout or excessive memory consumption` in the doc build.
It's after 1552 seconds, so it not being a round number means it might be the memory?
It follows `writing output... [ 90%] generated/xarray.core.rolling.DatasetRolling.max`, which I wouldn't have thought as a particularly memory-intensive part of the build?
Here's an example: https://readthedocs.org/projects/xray/builds/21983708/
Any thoughts for what might be going on? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8221/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1326238990,I_kwDOAMm_X85PDM0O,6870,`rolling_exp` loses coords,5635139,closed,0,,,4,2022-08-02T18:27:44Z,2023-09-19T01:13:23Z,2023-09-19T01:13:23Z,MEMBER,,,,"### What happened?
We lose the time coord here — `Dimensions without coordinates: time`:
```python
ds = xr.tutorial.load_dataset(""air_temperature"")
ds.rolling_exp(time=5).mean()
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Dimensions without coordinates: time
Data variables:
air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.4 296.1 295.7
```
(I realize I wrote this, I didn't think this used to happen, but either it always did or I didn't write good enough tests... mea culpa)
### What did you expect to happen?
We keep the time coords, like we do for normal `rolling`:
```python
In [2]: ds.rolling(time=5).mean()
Out[2]:
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
```
### Minimal Complete Verifiable Example
```Python
(as above)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 (main, May 24 2022, 21:13:51)
[Clang 13.1.6 (clang-1316.0.21.2)]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.21.6
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.12.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.12.0
distributed: 2021.12.0
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: 0.2.1
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 62.3.2
pip: 22.1.2
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.3.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6870/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
598991028,MDU6SXNzdWU1OTg5OTEwMjg=,3967,Support static type analysis ,6130352,closed,0,,,4,2020-04-13T16:34:43Z,2023-09-17T19:43:32Z,2023-09-17T19:43:31Z,NONE,,,,"As a related discussion to https://github.com/pydata/xarray/issues/3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis.
In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able enforce that names and dtypes associated with both data variables and coordinates meet certain constraints.
@keewis mentioned an example of this in https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 where it might be possible to use something like a ```TypedDict``` to constrain variable/coord names and array dtypes, but this won't work with TypedDict as it's currently implemented. Another possibility could be generics, and I took a stab at that in https://github.com/pydata/xarray/issues/3959#issuecomment-612513722 (though this would certainly be more intrusive).
An example of where this would be useful is in adding extensions through accessors:
```python
@xr.register_dataset_accessor('ext')
def ExtAccessor:
def __init__(self, ds)
self.data = ds
def is_zero(self):
return self.ds['data'] == 0
ds = xr.Dataset(dict(DATA=xr.DataArray([0.0])))
# I'd like to catch that ""data"" was misspelled as ""DATA"" and that
# this particular method shouldn't be run against floats prior to runtime
ds.ext.is_zero()
```
I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too.
There is a related conversation on doing something like this for Pandas DataFrames at https://github.com/python/typing/issues/28#issuecomment-351284520, so that might be helpful context for possibilities with ```TypeDict```.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3967/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
561921094,MDU6SXNzdWU1NjE5MjEwOTQ=,3762,xarray groupby/map fails to parallelize,6491058,closed,1,,,4,2020-02-07T23:20:59Z,2023-09-15T15:52:42Z,2023-09-15T15:52:41Z,NONE,,,,"#### MCVE Code Sample
```python
import sys
import math
import logging
import dask
import xarray
import numpy
logger = logging.getLogger('main')
if __name__ == '__main__':
logging.basicConfig(
stream=sys.stdout,
format='%(asctime)s %(levelname)-8s %(message)s',
level=logging.INFO,
datefmt='%Y-%m-%d %H:%M:%S')
logger.info('Starting dask client')
client = dask.distributed.Client()
SIZE = 100000
SONAR_BINS = 2000
time = range(0, SIZE)
upper_limit = numpy.random.randint(0, 10, (SIZE))
lower_limit = numpy.random.randint(20, 30, (SIZE))
sonar_data = numpy.random.randint(0, 255, (SIZE, SONAR_BINS))
channel = xarray.Dataset({
'upper_limit': (['time'], upper_limit, {'units': 'depth meters'}),
'lower_limit': (['time'], lower_limit, {'units': 'depth meters'}),
'data': (['time', 'depth_bin'], sonar_data, {'units': 'amplitude'}),
},
coords={
'depth_bin': (['depth_bin'], range(0,SONAR_BINS)),
'time': (['time'], time)
})
logger.info('get overall min/max radar range we want to normalize to called the adjusted range')
adjusted_min, adjusted_max = channel.upper_limit.min().values.item(), channel.lower_limit.max().values.item()
adjusted_min = math.floor(adjusted_min)
adjusted_max = math.ceil(adjusted_max)
logger.info('adjusted_min: %s, adjusted_max: %s', adjusted_min, adjusted_max)
bin_count = len(channel.depth_bin)
logger.info('bin_count: %s', bin_count)
adjusted_depth_per_bin = (adjusted_max - adjusted_min) / bin_count
logger.info('adjusted_depth_per_bin: %s', adjusted_depth_per_bin)
adjusted_bin_depths = [adjusted_min + (j * adjusted_depth_per_bin) for j in range(0, bin_count)]
logger.info('adjusted_bin_depths[0]: %s ... [-1]: %s', adjusted_bin_depths[0], adjusted_bin_depths[-1])
def Interp(ds):
# Ideally instead of using interp we will use some kind of downsampling and shift
# this doesnt exist in xarray though and interp is good enough for the moment
# I just added this to debug
t = ds.time.values.item()
if (t % 100) == 0:
total = len(channel.time)
perc = 100.0 * t / total
logger.info('%s : %s of %s', perc, t, total)
unadjusted_depth_amplitudes = ds.data
unadjusted_min = ds.upper_limit.values.item()
unadjusted_max = ds.lower_limit.values.item()
unadjusted_depth_per_bin = (unadjusted_max - unadjusted_min) / bin_count
index_mapping = [((adjusted_min + (bin * adjusted_depth_per_bin)) - unadjusted_min) / unadjusted_depth_per_bin for bin in range(0, bin_count)]
adjusted_depth_amplitudes = unadjusted_depth_amplitudes.interp(coords={'depth_bin':index_mapping}, method='linear', assume_sorted=True)
adjusted_depth_amplitudes = adjusted_depth_amplitudes.rename({'depth_bin':'depth'}).assign_coords({'depth':adjusted_bin_depths})
#logger.info('%s, \n\tunadjusted_depth_amplitudes.values:%s\n\tunadjusted_min:%s\n\tunadjusted_max:%s\n\tunadjusted_depth_per_bin:%s\n\tindex_mapping:%s\n\tadjusted_depth_amplitudes:%s\n\tadjusted_depth_amplitudes.values:%s\n\n', ds, unadjusted_depth_amplitudes.values, unadjusted_min, unadjusted_max, unadjusted_depth_per_bin, index_mapping, adjusted_depth_amplitudes, adjusted_depth_amplitudes.values)
return adjusted_depth_amplitudes
# Lets split into chunks so could be performed in parallel
# This doesnt work to parallelize and only slows it down a lot
#logger.info('chunk')
#channel = channel.chunk({'time':100})
logger.info('groupby')
g = channel.groupby('time')
logger.info('do interp')
normalized_depth_data = g.map(Interp)
logger.info('done')
```
#### Expected Output
I am fairly new to xarray but feel this example could have been executed a bit better than xarray currenty does. Each map call of the above custom function should be possible to be parallelized from what I can tell. I imagined that in the backend, xarray would have chunked it and run in parallel on dask. However I find it is VERY slow even for single threaded case but also that it doesn't seem to parallelize.
It takes roughly 5msec per map call in my hardware when I don't include the chunk and 70msec with the chunk call you can find in the code.
#### Problem Description
The single threaded performance is super slow, but also it fails to parallelize the computations across the cores on my machine.
If you are after more background to what I am trying to do, I also asked a SO question about how to re-organize the code to improve performance. I felt the current behavior though is a performance bug (assuming I didn't do something completely wrong in the code).
https://stackoverflow.com/questions/60103317/can-the-performance-of-using-xarray-groupby-map-be-improved
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
xarray.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.1
distributed: 2.9.1
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 44.0.0.post20200102
pip: 19.3.1
conda: None
pytest: None
IPython: 7.11.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3762/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1473152374,I_kwDOAMm_X85XzoV2,7348,Using entry_points to register dataset and dataarray accessors?,1386642,open,0,,,4,2022-12-02T16:48:42Z,2023-09-14T19:53:46Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem?
External libraries often use the dataset/dataarray accessor pattern (e.g. [metpy](https://github.com/Unidata/MetPy/blob/f568aca6325cb23cfccc1006c4965ef7f7b5ad29/src/metpy/xarray.py#L105)). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g.
```
import metpy # linter complains here
# some data
ds: xr.Dataset = ...
ds.metpy....
```
### Describe the solution you'd like
Use importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. [metpy's setup.cfg](https://github.com/Unidata/MetPy/blob/f568aca6325cb23cfccc1006c4965ef7f7b5ad29/src/metpy/xarray.py#L105)).
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7348/reactions"", ""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue
1098241812,I_kwDOAMm_X85BddcU,6149,[Bug]: `numpy` `DeprecationWarning` with `DType` and `xr.testing.assert_all_close()` + Dask,25624127,closed,0,,,4,2022-01-10T18:34:27Z,2023-09-13T20:06:59Z,2023-09-13T20:06:58Z,CONTRIBUTOR,,,,"### What happened?
A `numpy` `DeprecationWarning` regarding `DType` is being outputted when using `xr.testing.assert_all_close()` to compare two chunked Datasets. This does warning does not appear with two non-chunked datasets.
### What did you expect to happen?
The warning should not appear.
### Minimal Complete Verifiable Example
```python
class TestTemporalAvg:
class TestTimeseries:
@pytest.fixture(autouse=True)
def setup(self):
self.ds: xr.Dataset = generate_dataset(cf_compliant=True, has_bounds=True)
# No warning with this test
def test_weighted_annual_avg(self):
ds = self.ds.copy()
result = ds.temporal.temporal_avg(""timeseries"", ""year"", data_var=""ts"")
expected = ds.copy()
expected[""ts""] = xr.DataArray(
name=""ts"",
data=np.ones((2, 4, 4)),
coords={
""lat"": self.ds.lat,
""lon"": self.ds.lon,
""year"": pd.MultiIndex.from_tuples(
[(2000,), (2001,)],
),
},
dims=[""year"", ""lat"", ""lon""],
attrs={
""operation"": ""temporal_avg"",
""mode"": ""timeseries"",
""freq"": ""year"",
""groupby"": ""year"",
""weighted"": ""True"",
""centered_time"": ""True"",
},
)
# For some reason, there is a floating point difference between both
# for ts so we have to use floating point comparison
xr.testing.assert_allclose(result, expected)
assert result.ts.attrs == expected.ts.attrs
# Warning with this test
@requires_dask
def test_weighted_annual_avg_with_chunking(self):
ds = self.ds.copy().chunk({""time"": 2})
result = ds.temporal.temporal_avg(""timeseries"", ""year"", data_var=""ts"")
expected = ds.copy()
expected[""ts""] = xr.DataArray(
name=""ts"",
data=np.ones((2, 4, 4)),
coords={
""lat"": ds.lat,
""lon"": ds.lon,
""year"": pd.MultiIndex.from_tuples(
[(2000,), (2001,)],
),
},
dims=[""year"", ""lat"", ""lon""],
attrs={
""operation"": ""temporal_avg"",
""mode"": ""timeseries"",
""freq"": ""year"",
""groupby"": ""year"",
""weighted"": ""True"",
""centered_time"": ""True"",
},
)
# For some reason, there is a floating point difference between both
# for ts so we have to use floating point comparison
xr.testing.assert_allclose(result, expected)
assert result.ts.attrs == expected.ts.attrs
```
### Relevant log output
```python
DeprecationWarning: The `dtype` and `signature` arguments to ufuncs only select the general DType and not details such as the byte order or time unit (with rare exceptions see release notes). To avoid this warning please use the scalar types `np.float64`, or string notation.
In rare cases where the time unit was preserved, either cast the inputs or provide an output array. In the future NumPy may transition to allow providing `dtype=` to denote the outputs `dtype` as well. (Deprecated NumPy 1.21)
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.45.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: None
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.11.2
distributed: 2021.11.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
setuptools: 59.6.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.30.1
sphinx: 4.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6149/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
1075765204,I_kwDOAMm_X85AHt_U,6055,Unexpected type conversion in variables with _FillValue,24235303,closed,0,,,4,2021-12-09T16:26:54Z,2023-09-13T12:40:14Z,2023-09-13T12:40:13Z,CONTRIBUTOR,,,,"**What happened**:
When opening a dataset with an int16 variable with the `_FillValue` attribute, the variable is converted from type int16 to float32. This was originally reported to the TileDB-CF-Py Git repo that contains a TileDB backend for xarray. See [TileDB-CF-Py issue #117](https://github.com/TileDB-Inc/TileDB-CF-Py/issues/117).
**What you expected to happen**:
I would expect the type to remain the same when applying the _FillValue.
**Minimal Complete Verifiable Example**:
Original example from [TileDB-CF-Py issue #117](https://github.com/TileDB-Inc/TileDB-CF-Py/issues/117) using the TileDB backend.
```python
import tiledb
import xarray as xr
import numpy as np
index = tiledb.Dim(name='index', domain=(0, 3))
domain = tiledb.Domain(index)
var = tiledb.Attr(name='var', dtype=np.int16)
schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False)
tiledb.Array.create('dense_array0', schema)
with tiledb.open('dense_array0', 'w') as A:
A[:] = np.array([5, 6, 7, 8], dtype=np.int16)
ds = xr.open_dataset('dense_array0', engine='tiledb')
ds['var'].dtype
```
NetCDF example with the same behavior:
```python
import netCDF4
import xarray as xr
import numpy as np
filename = 'temp_file.nc'
with netCDF4.Dataset(filename, mode=""w"") as group:
group.createDimension(""index"", 4)
var = group.createVariable(""var"", np.int16, (""index"",), fill_value=-1)
var[:] = np.array([5, 6, 7, 8], dtype=np.int16)
dataset = xr.open_dataset(filename)
dataset[""var""].dtype
```
**Anything else we need to know?**:
* I was able to verify the type conversion from int16 to float32 occurs in the `conventions.decode_cf_variables` call in the `open_dataset` method of `StoreBackendEntrypoint`.
* I was able to verify the conversion does not happen if `mask_and_scale=False`.
* Note that TileDB is automatically setting a fill value for all dense numerical arrays, and so we are always setting the `_FillValue` attribute for variables from the TileDB backend.
**Environment**:
I was able to reproduce this with both xarray 0.19.0 and 0.20.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6055/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,completed,13221727,issue
514672231,MDU6SXNzdWU1MTQ2NzIyMzE=,3466,RuntimeError: NetCDF: DAP failure,47066389,closed,1,,,4,2019-10-30T13:32:34Z,2023-09-12T16:00:57Z,2023-09-12T16:00:57Z,NONE,,,,"Hi all,
I am interested in extracting specific point and variable information from the GEOS-FC product, accessible via OpenDap.
Loading the data seems to work fine, and I can do some processing to my specific needs.
Ideally I would like to convert this selection to a dataframe, or if needed store as an intermediate file from which I can read again.
Yet when doing so, I get the following error: RuntimeError: NetCDF: DAP failure
I am not sure what is causing this? Perhaps I chunck the data in the wrong (inefficient) way? Or there is an error with the GEOS netcdf files? Or ...
Below a working code snippet.
``` python
import xarray as xr
idir_geos = 'https://opendap.nccs.nasa.gov/dods/gmao/geos-cf/assim/chm_tavg_1hr_g1440x721_v1'
def preprocess(ds):
''' Rename variables and select the relevant ones. Remove lev'''
ds = ds.rename({'pm25_rh35_gcc': 'PM2.5','no': 'NO','no2': 'NO2','o3': 'O3','so2': 'SO2','co': 'CO'})
ds = ds[['PM2.5','NO','NO2','O3','SO2','CO']]
ds = ds.squeeze('lev')
return ds
ds = xr.open_mfdataset([idir_geos],preprocess=preprocess,combine='by_coords')
lat = 51.25
lon = 4.25
pol = 'O3'
ds_sel = ds.sel(lat=lat,lon=lon,method='nearest')[pol]
df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1)
#ds_sel.to_netcdf('test.nc') # Runtime error
```
Traceback error:
> Traceback (most recent call last):
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py"", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File """", line 57, in
df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4285, in to_dataframe
return self._to_dataframe(self.dims)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4273, in _to_dataframe
for k in columns
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/dataset.py"", line 4273, in
for k in columns
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py"", line 437, in values
return _as_array_or_item(self._data)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/variable.py"", line 250, in _as_array_or_item
data = np.asarray(data)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/usr/lib/python3/dist-packages/dask/array/core.py"", line 1138, in __array__
x = self.compute()
File ""/usr/lib/python3/dist-packages/dask/base.py"", line 135, in compute
(result,) = compute(self, traverse=False, **kwargs)
File ""/usr/lib/python3/dist-packages/dask/base.py"", line 333, in compute
results = get(dsk, keys, **kwargs)
File ""/usr/lib/python3/dist-packages/dask/threaded.py"", line 75, in get
pack_exception=pack_exception, **kwargs)
File ""/usr/lib/python3/dist-packages/dask/local.py"", line 521, in get_async
raise_exception(exc, tb)
File ""/usr/lib/python3/dist-packages/dask/compatibility.py"", line 60, in reraise
raise exc
File ""/usr/lib/python3/dist-packages/dask/local.py"", line 290, in execute_task
result = _execute_task(task, data)
File ""/usr/lib/python3/dist-packages/dask/local.py"", line 271, in _execute_task
return func(*args2)
File ""/usr/lib/python3/dist-packages/dask/array/core.py"", line 72, in getter
c = np.asarray(c)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 490, in __array__
return np.asarray(self.array, dtype=dtype)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 652, in __array__
return np.asarray(self.array, dtype=dtype)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 556, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py"", line 73, in __array__
return self.func(self.array)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/coding/variables.py"", line 142, in _apply_mask
data = np.asarray(data, dtype=dtype)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/numpy/core/_asarray.py"", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 556, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 72, in __getitem__
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/core/indexing.py"", line 836, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/netCDF4_.py"", line 84, in _getitem
array = getitem(original_array, key)
File ""/home/demuzmp4/.local/lib/python3.6/site-packages/xarray/backends/common.py"", line 54, in robust_getitem
return array[key]
File ""netCDF4/_netCDF4.pyx"", line 4408, in netCDF4._netCDF4.Variable.__getitem__
File ""netCDF4/_netCDF4.pyx"", line 5352, in netCDF4._netCDF4.Variable._get
File ""netCDF4/_netCDF4.pyx"", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP failure
More info on my xarray installation:
------------------
commit: None
python: 3.6.9 (default, Jul 3 2019, 07:38:46)
[GCC 8.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_GB.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3
xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.3
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.28
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 0.16.0
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 9.0.1
conda: None
pytest: 5.2.1
IPython: 7.3.0
sphinx: 1.8.4","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3466/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1339921253,I_kwDOAMm_X85P3ZNl,6919,Parallel read with MPI,8100801,closed,0,,,4,2022-08-16T07:19:14Z,2023-09-12T15:16:32Z,2023-09-12T15:16:31Z,NONE,,,,"### Is your feature request related to a problem?
Is it possible to somehow extend xarray to use MPI I/O?
### Describe the solution you'd like
We would need to know the offset from where the actual data starts within the file.
Is there a way of retrieving that?
Disclaimer: I am not an expert of NetCDF format - so, apologies if the question is trivial!
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6919/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1861335844,I_kwDOAMm_X85u8bsk,8096,Errors when saving PyObject coordinates,38408316,closed,0,,,4,2023-08-22T12:14:53Z,2023-09-06T11:44:41Z,2023-09-06T11:44:41Z,CONTRIBUTOR,,,,"### What happened?
Hi, I'm trying to create a `DataArray` with coordinates that are tuples and potentionally even more dimensional objects. The way I did it is to create an empty `numpy` array with `dtype=object` and then insert my tuples inside. This doesn't throw an error when creating a `DataArray` (as opposed to using a 2D ndarray or a list of lists). However, when trying to save it to `zarr` or `netcdf`. I get an error saying `ValueError: setting an array element with a sequence`
### What did you expect to happen?
I want to be able to save and load such coordinates without errors. Maybe there is a cleaner way to do it than the object dtype ndarray?
### Minimal Complete Verifiable Example
```Python
n = 5
x = np.empty(n, dtype=object)
for i in range(n):
x[i] = (i, i)
xr.DataArray(np.arange(n), dims=(""x""), coords={""x"": x}).to_zarr(""test"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
File c:\Users\Wiktor\AppData\Local\pypoetry\Cache\virtualenvs\spin1-JGuolXDk-py3.11\Lib\site-packages\xarray\core\dataarray.py:4014, in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
4010 else:
4011 # No problems with the name - so we're fine!
4012 dataset = self.to_dataset()
-> 4014 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
4015 dataset,
4016 path,
4017 mode=mode,
4018 format=format,
4019 group=group,
4020 engine=engine,
4021 encoding=encoding,
4022 unlimited_dims=unlimited_dims,
...
101 result = np.empty(data.shape, dtype)
--> 102 result[...] = data
103 return result
ValueError: setting an array element with a sequence.
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Polish_Poland', '1250')
libhdf5: None
libnetcdf: None
xarray: 2023.8.0
pandas: 2.0.3
numpy: 1.25.2
scipy: 1.11.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: 7.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8096/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1870484988,I_kwDOAMm_X85vfVX8,8120,"`open_mfdataset` exits while sending a ""Segmentation fault"" error",50383939,closed,0,,,4,2023-08-28T20:51:23Z,2023-09-01T15:43:08Z,2023-09-01T15:43:08Z,NONE,,,,"### What is your issue?
I try to open about ~10 files, each 5MB as a test case, using `xarray`'s `open_mfdataset` method with the `parallel=True` option, however, it throws a ""Segmentation fault"" error as the following:
```python
$ ipython
Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import xarray as xr
In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})
In [3]: ds
Out[3]:
Dimensions: (time: 744, rlat: 140, rlon: 105)
Coordinates:
* time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0...
lon (rlat, rlon) float32 dask.array
lat (rlat, rlon) float32 dask.array
* rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4
* rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68
Data variables:
rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array
RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array
RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array
RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array
RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array
RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array
RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array
Attributes:
CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
Conventions: CF-1.6
product: RDRS_v2.1
Remarks: Variable names are following the convention _= 0`.
One instance where this causes immediate errors is when trying to print the resulting dataset. As part of the `__repr__` of a Dataset, a boolean evaluation of the DataVariable is performed (`if mapping:` in `xarray/core/formatting.py` in `_mapping_repr`), calling `__len__` to check the truth value and triggering the ValueError.
While this is undoubtedly only one of many places where the incorrect `__len__` causes issues, it is a rather pressing one as it even stops one from inspecting the Dataset in the most common way (printing it). The ValueError it produces is also very hard to trace back to the actual cause, likely completely throwing users off from fixing their code.
### What did you expect to happen?
To get a Dataset with the correct `_coord_names` property, and in no circumstance whatsoever to get a Dataset which reports a negative length
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds1 = xr.Dataset(coords={""foo"": [1, 2, 3], ""bar"": 4})
ds2 = xr.Dataset(coords={""foo"": [1, 2, 3], ""bar"": 5})
res = xr.merge([ds1, ds2], compat=""minimal"") # If the result is not captured in res, this will cause a ValueError as the interpreter attempts to print the result
res.coords
# Coordinates:
# * foo (foo) int64 1 2 3
res._coord_names
# {'foo', 'bar'}
""bar"" in res.coords # As shown in issue #7405. Note ""bar"" is not printed in res.coords, revealing an interesting disconnect in behaviors of different functions targeting a dataset's coordinates
# True
res
# ValueError: __len__() should return >= 0
```
### MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
>>> import xarray as xr
>>> ds1 = xr.Dataset(coords={""foo"": [1, 2, 3], ""bar"": 4})
>>> ds2 = xr.Dataset(coords={""foo"": [1, 2, 3], ""bar"": 5})
>>> res = xr.merge([ds1, ds2], compat=""minimal"")
>>> res.coords
Coordinates:
* foo (foo) int64 1 2 3
>>> res._coord_names
{'bar', 'foo'}
>>> ""bar"" in res.coords
True
>>> res
Traceback (most recent call last):
File """", line 1, in
File ""/home/redacted/.venv/lib/python3.10/site-packages/xarray/core/dataset.py"", line 2116, in __repr__
return formatting.dataset_repr(self)
File ""/usr/lib/python3.10/reprlib.py"", line 21, in wrapper
result = user_function(self)
File ""/home/redacted/.venv/lib/python3.10/site-packages/xarray/core/formatting.py"", line 673, in dataset_repr
summary.append(data_vars_repr(ds.data_vars, col_width=col_width, max_rows=max_rows))
File ""/home/redacted/.lvenv/lib/python3.10/site-packages/xarray/core/formatting.py"", line 357, in _mapping_repr
if mapping:
ValueError: __len__() should return >= 0
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.16.3-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.2.0
pandas: 1.5.1
numpy: 1.24.2
scipy: 1.10.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.6
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.6
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.6.0
pip: 23.0.1
conda: None
pytest: 7.2.1
mypy: 1.0.1
IPython: 7.34.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7588/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1858062203,I_kwDOAMm_X85uv8d7,8090,DataArrayResampleAggregations break with _flox_reduce where source DataArray has a discontinuous time dimension,56110893,open,0,,,4,2023-08-20T09:48:42Z,2023-08-24T04:20:32Z,,NONE,,,,"### What happened?
When resampling a DataArray with a discontinuity in the time dimension the resample object contains placeholder groups for the missing times in between the present times.
This seems to cause flox reductions to break (`any`, `count` and `all`) as it complains about a `fill_value` of `None`. See example provided below.
### What did you expect to happen?
The result should be computed successfully in the same way that it is without using flox.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
dates = ((""1980-12-01"", ""1990-11-30""), (""2000-12-01"", ""2010-11-30""))
times = [xr.cftime_range(*d, freq=""D"", calendar=""360_day"") for d in dates]
da = xr.concat(
[xr.DataArray(np.random.rand(len(t)), coords={""time"": t}, dims=""time"") for t in times],
dim=""time""
)
da = da.chunk(time=360)
with xr.set_options(use_flox=True): # FAILS - discontinuous time dimension before resample
(da > 0.5).resample(time=""AS-DEC"").any(dim=""time"")
# with xr.set_options(use_flox=True): # SUCCEEDS - continuous time dimension before resample
# (da.sel(time=slice(*dates[0])) > 0.5).resample(time=""AS-DEC"").any(dim=""time"")
# with xr.set_options(use_flox=True): # SUCCEEDS - compute chunks before resample
# (da > 0.5).compute().resample(time=""AS-DEC"").any(dim=""time"")
# with xr.set_options(use_flox=False): # SUCCEEDS - don't use flox
# (da > 0.5).resample(time=""AS-DEC"").any(dim=""time"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[60], line 1
----> 1 (da > 0.5).resample(time=""AS-DEC"").any(dim=""time"")
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/_aggregations.py:7029, in DataArrayResampleAggregations.any(self, dim, keep_attrs, **kwargs)
6960 """"""
6961 Reduce this DataArray's data by applying ``any`` along some dimension(s).
6962
(...)
7022 * time (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
7023 """"""
7024 if (
7025 flox_available
7026 and OPTIONS[""use_flox""]
7027 and contains_only_chunked_or_numpy(self._obj)
7028 ):
-> 7029 return self._flox_reduce(
7030 func=""any"",
7031 dim=dim,
7032 # fill_value=fill_value,
7033 keep_attrs=keep_attrs,
7034 **kwargs,
7035 )
7036 else:
7037 return self.reduce(
7038 duck_array_ops.array_any,
7039 dim=dim,
7040 keep_attrs=keep_attrs,
7041 **kwargs,
7042 )
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/resample.py:57, in Resample._flox_reduce(self, dim, keep_attrs, **kwargs)
51 def _flox_reduce(
52 self,
53 dim: Dims,
54 keep_attrs: bool | None = None,
55 **kwargs,
56 ) -> T_Xarray:
---> 57 result = super()._flox_reduce(dim=dim, keep_attrs=keep_attrs, **kwargs)
58 result = result.rename({RESAMPLE_DIM: self._group_dim})
59 return result
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/groupby.py:1018, in GroupBy._flox_reduce(self, dim, keep_attrs, **kwargs)
1015 kwargs.setdefault(""min_count"", 1)
1017 output_index = grouper.full_index
-> 1018 result = xarray_reduce(
1019 obj.drop_vars(non_numeric.keys()),
1020 self._codes,
1021 dim=parsed_dim,
1022 # pass RangeIndex as a hint to flox that `by` is already factorized
1023 expected_groups=(pd.RangeIndex(len(output_index)),),
1024 isbin=False,
1025 keep_attrs=keep_attrs,
1026 **kwargs,
1027 )
1029 # we did end up reducing over dimension(s) that are
1030 # in the grouped variable
1031 group_dims = grouper.group.dims
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:408, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, fill_value, dtype, method, engine, keep_attrs, skipna, min_count, reindex, *by, **finalize_kwargs)
406 output_core_dims = [d for d in input_core_dims[0] if d not in dim_tuple]
407 output_core_dims.extend(group_names)
--> 408 actual = xr.apply_ufunc(
409 wrapper,
410 ds_broad.drop_vars(tuple(missing_dim)).transpose(..., *grouper_dims),
411 *by_da,
412 input_core_dims=input_core_dims,
413 # for xarray's test_groupby_duplicate_coordinate_labels
414 exclude_dims=set(dim_tuple),
415 output_core_dims=[output_core_dims],
416 dask=""allowed"",
417 dask_gufunc_kwargs=dict(
418 output_sizes=group_sizes, output_dtypes=[dtype] if dtype is not None else None
419 ),
420 keep_attrs=keep_attrs,
421 kwargs={
422 ""func"": func,
423 ""axis"": axis,
424 ""sort"": sort,
425 ""fill_value"": fill_value,
426 ""method"": method,
427 ""min_count"": min_count,
428 ""skipna"": skipna,
429 ""engine"": engine,
430 ""reindex"": reindex,
431 ""expected_groups"": tuple(expected_groups),
432 ""isbin"": isbins,
433 ""finalize_kwargs"": finalize_kwargs,
434 ""dtype"": dtype,
435 ""core_dims"": input_core_dims,
436 },
437 )
439 # restore non-dim coord variables without the core dimension
440 # TODO: shouldn't apply_ufunc handle this?
441 for var in set(ds_broad._coord_names) - set(ds_broad._indexes) - set(ds_broad.dims):
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:1185, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
1183 # feed datasets apply_variable_ufunc through apply_dataset_vfunc
1184 elif any(is_dict_like(a) for a in args):
-> 1185 return apply_dataset_vfunc(
1186 variables_vfunc,
1187 *args,
1188 signature=signature,
1189 join=join,
1190 exclude_dims=exclude_dims,
1191 dataset_join=dataset_join,
1192 fill_value=dataset_fill_value,
1193 keep_attrs=keep_attrs,
1194 )
1195 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
1196 elif any(isinstance(a, DataArray) for a in args):
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:469, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, *args)
464 list_of_coords, list_of_indexes = build_output_coords_and_indexes(
465 args, signature, exclude_dims, combine_attrs=keep_attrs
466 )
467 args = tuple(getattr(arg, ""data_vars"", arg) for arg in args)
--> 469 result_vars = apply_dict_of_variables_vfunc(
470 func, *args, signature=signature, join=dataset_join, fill_value=fill_value
471 )
473 out: Dataset | tuple[Dataset, ...]
474 if signature.num_outputs > 1:
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:411, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, *args)
409 result_vars = {}
410 for name, variable_args in zip(names, grouped_by_name):
--> 411 result_vars[name] = func(*variable_args)
413 if signature.num_outputs > 1:
414 return _unpack_dict_tuples(result_vars, signature.num_outputs)
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:761, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
756 if vectorize:
757 func = _vectorize(
758 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims
759 )
--> 761 result_data = func(*input_data)
763 if signature.num_outputs == 1:
764 result_data = (result_data,)
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:379, in xarray_reduce..wrapper(array, func, skipna, core_dims, *by, **kwargs)
376 offset = min(array)
377 array = datetime_to_numeric(array, offset, datetime_unit=""us"")
--> 379 result, *groups = groupby_reduce(array, *by, func=func, **kwargs)
381 # Output of count has an int dtype.
382 if requires_numeric and func != ""count"":
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:2011, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, dtype, min_count, method, engine, reindex, finalize_kwargs, *by)
2005 groups = (groups[0][sorted_idx],)
2007 if factorize_early:
2008 # nan group labels are factorized to -1, and preserved
2009 # now we get rid of them by reindexing
2010 # This also handles bins with no data
-> 2011 result = reindex_(
2012 result, from_=groups[0], to=expected_groups, fill_value=fill_value
2013 ).reshape(result.shape[:-1] + grp_shape)
2014 groups = final_groups
2016 if is_bool_array and (_is_minmax_reduction(func) or _is_first_last_reduction(func)):
File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:428, in reindex_(array, from_, to, fill_value, axis, promote)
426 if any(idx == -1):
427 if fill_value is None:
--> 428 raise ValueError(""Filling is required. fill_value cannot be None."")
429 indexer[axis] = idx == -1
430 # This allows us to match xarray's type promotion rules
ValueError: Filling is required. fill_value cannot be None.
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 22.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.7.0
pandas: 1.5.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: 3.6.1
bottleneck: 1.3.7
dask: 2023.8.1
distributed: 2023.8.1
matplotlib: 3.7.2
cartopy: 0.22.0
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.6.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8090/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1325665237,I_kwDOAMm_X85PBAvV,6866,Confusing terminologies and some errors in the official documentation,49091585,closed,0,,,4,2022-08-02T10:48:07Z,2023-08-23T14:20:23Z,2023-08-23T14:20:23Z,NONE,,,,"### What happened?
To note, I'm using the stable version(2022.6.0).
First, I'm confused that both `dimension coordinate`/`non-dimension coordinate` and `index coordinate`/`non-index coordinate` appear in the documentation(search to see), but they seem to be the same thing.
Second, I found that there are some errors in the documentation:
- It says that ""[The index associated with dimension name x can be retrieved by arr.indexes[x]. By construction, `len(arr.dims) == len(arr.indexes)`](https://docs.xarray.dev/en/stable/user-guide/terminology.html#:~:text=The%20index%20associated%20with%20dimension%20name%20x%20can%20be%20retrieved%20by%20arr.indexes%5Bx%5D.%20By%20construction%2C%20len(arr.dims)%20%3D%3D%20len(arr.indexes))"", which is inconsistent with actual behavior. See example code below:
```python
In [0]: import xarray as xr, numpy as np
In [1]: arr = xr.DataArray(np.zeros((2, 3)), dims=['x', 'y'], coords={'x': ['a', 'b']})
In [2]: assert len(arr.dims) == len(arr.indexes), f""{len(arr.dims)=}, {len(arr.indexes)=}""
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in
----> 1 assert len(arr.dims) == len(arr.indexes), f""{len(arr.dims)=}, {len(arr.indexes)=}""
AssertionError: len(arr.dims)=2, len(arr.indexes)=1
In [3]: arr.indexes
Out[3]:
Indexes:
x: Index(['a', 'b'], dtype='object', name='x')
```
It seems that `arr.indexes` only returns indexes of dimensions that have coordinates. However, it's possible to get the index of
dimension `y` through `get_index()`:
```python
In [4]: arr.get_index('y')
Out[4]: RangeIndex(start=0, stop=3, step=1, name='y')
```
- It says that: (see [link](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#indexes:~:text=For%20convenience%20multi%2Dindex%20levels%20are%20directly%20accessible%20as%20%E2%80%9Cvirtual%E2%80%9D%20or%20%E2%80%9Cderived%E2%80%9D%20coordinates%20(marked%20by%20%2D%20when%20printing%20a%20dataset%20or%20data%20array)%3A))
> For convenience multi-index levels are directly accessible as “virtual” or “derived” coordinates (marked by - when printing a dataset or data array):
> ```python
> In [77]: mda[""band""]
> Out[77]:
>
> array(['R', 'R', 'V', 'V'], dtype=object)
> Coordinates:
> * spec (spec) object MultiIndex
> * band (spec) object 'R' 'R' 'V' 'V'
> * wn (spec) float64 0.1 0.2 0.7 0.9
>
> In [78]: mda.wn
> Out[78]:
>
> array([0.1, 0.2, 0.7, 0.9])
> Coordinates:
> * spec (spec) object MultiIndex
> * band (spec) object 'R' 'R' 'V' 'V'
> * wn (spec) float64 0.1 0.2 0.7 0.9
> ```
As you can see, even in the given example code offered by the offical, all the ""virtual"" coordinates are marked as `*` instead of `-`, which is a little bit confusing when handling multi-index coordinates in my experience.
May I have missed something? Thanks in advance for the reply.
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
_No response_
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.1
scipy: 1.3.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 45.2.0
pip: 22.2.1
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6866/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
979316661,MDU6SXNzdWU5NzkzMTY2NjE=,5738,Flexible indexes: how to handle possible dimension vs. coordinate name conflicts?,4160723,closed,0,,,4,2021-08-25T15:31:39Z,2023-08-23T13:28:41Z,2023-08-23T13:28:40Z,MEMBER,,,,"Another thing that I've noticed while working on #5692.
Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with `sel` or `unstack`). See #2299.
I'm wondering how we should handle this in the context of flexible / custom indexes:
A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in `sel` or `stack`?
B. Introduce some tag in `xarray.Index` so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming)
C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly?
D. Eventually revert #2353 and let users taking care of potential conflicts.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5738/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
448082431,MDU6SXNzdWU0NDgwODI0MzE=,2986,How to add a custom indexer.,397386,closed,0,,,4,2019-05-24T09:56:25Z,2023-08-23T12:24:21Z,2023-08-23T12:24:20Z,CONTRIBUTOR,,,,"Hello,
I have written a set of indexers for 1D, 2D and 3D geodetic and Cartesian data (up to 5 dimensions for Cartesian data).
I used the Boost/C++ library to write the multidimensional data search algorithm. This tree (R*Tree) is impressive for its performance. It can be built in a few seconds with several million points and made requests for a few seconds with several million points.
```python
import numpy as np
# Install it with conda, if you want, only for python3.7: conda install pyindex -c fbriol
import pyindex.core as core
lon = np.random.uniform(-180.0, 180.0, 2048*4096)
lat = np.random.uniform(-90.0, 90.0, 2048*4096)
# You can not set an altitude if it is not necessary.
alt = np.random.uniform(-10000, 100000, 2048*4096)
# WGS system used
system = core.geodetic.System()
# RTree
tree = core.geodetic.RTree(system)
%timeit tree.packing(np.asarray((lon, lat, alt)).T)
# 3.84 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
coordinates = np.asarray((
np.random.uniform(-180.0, 180.0, 10000),
np.random.uniform(-90.0, 90.0, 10000),
np.random.uniform(-10000, 100000, 10000))).T
%timeit tree.query(coordinates)
# 18 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
I'm trying to use these indexes with Xarray, but I didn't quite understand how to interface with xarray.
Is there anyone who could explain to me how to write my own indexer to test these indexers with xarray? Thank you in advance.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2986/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1603957501,I_kwDOAMm_X85fmnL9,7573,Add optional min versions to conda-forge recipe (`run_constrained`),2448579,closed,0,,,4,2023-02-28T23:12:15Z,2023-08-21T16:12:34Z,2023-08-21T16:12:21Z,MEMBER,,,,"### Is your feature request related to a problem?
I opened this PR to add minimum versions for our optional dependencies: https://github.com/conda-forge/xarray-feedstock/pull/84/files to prevent issues like #7467
I think we'd need a policy to choose which ones to list. Here's the current list:
```
run_constrained:
- bottleneck >=1.3
- cartopy >=0.20
- cftime >=1.5
- dask-core >=2022.1
- distributed >=2022.1
- flox >=0.5
- h5netcdf >=0.13
- h5py >=3.6
- hdf5 >=1.12
- iris >=3.1
- matplotlib-base >=3.5
- nc-time-axis >=1.4
- netcdf4 >=1.5.7
- numba >=0.55
- pint >=0.18
- scipy >=1.7
- seaborn >=0.11
- sparse >=0.13
- toolz >=0.11
- zarr >=2.10
```
Some examples to think about:
1. `iris` seems like a bad one to force. It seems like people might use Iris and Xarray independently and Xarray shouldn't force a minimum version.
2. For backends, I arbitrarily kept `netcdf4`, `h5netcdf` and `zarr`.
3. It seems like we should keep array types: so `dask`, `sparse`, `pint`.
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7573/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1845132891,I_kwDOAMm_X85t-n5b,8062,"Dataset.chunk() does not overwrite encoding[""chunks""] ",2466330,open,0,,,4,2023-08-10T12:54:12Z,2023-08-14T18:23:36Z,,CONTRIBUTOR,,,,"### What happened?
When using the `chunk` function to change the chunk sizes of a Dataset (or DataArray, which uses the Dataset implementation of `chunk`), the chunk sizes of the Dask arrays are changed, but the ""chunks"" entry of the `encoding` attributes are not changed accordingly. This causes the raising of a NotImplementedError when attempting to write the Dataset to a zarr (and presumably other formats as well).
Looking at the implementation of `chunk`, every variable is rechunked using the `_maybe_chunk` function, which actually has the parameter `overwrite_encoded_chunks` to control just this behavior. However, it is an optional parameter which defaults to False, and the call in `chunk` does not provide a value for this parameter, nor does it offer the caller to influence it (by having an `overwrite_encoded_chunks` parameter itself, for example).
I do not know why this default value was chosen as False, or what could break if it was changed to True, but looking at the documentation, it seems the opposite of the intended effect. From the documentation of `to_zarr`:
> Zarr chunks are determined in the following way:
> From the chunks attribute in each variable’s encoding (can be set via Dataset.chunk).
Which is exactly what it doesn't.
### What did you expect to happen?
I would expect the ""chunks"" entry of the `encoding` attribute to be changed to reflect the new chunking scheme.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
# Create a test Dataset with dimension x and y, each of size 100, and a chunksize of 50
ds_original = xr.Dataset({""my_var"": ([""x"", ""y""], np.random.randn(100, 100))})
# Since 'chunk' does not work, manually set encoding
ds_original .my_var.encoding[""chunks""] = (50, 50)
# To best showcase the real-life example, write it to file and read it back again.
# The same could be achieved by just calling .chunk() with chunksizes of 25, but this feels more 'complete'
filepath = ""~/chunk_test.zarr""
ds_original.to_zarr(filepath)
ds = xr.open_zarr(filepath)
# Check the chunksizes and ""chunks"" encoding
print(ds.my_var.chunks)
# >>> ((50, 50), (50, 50))
print(ds.my_var.encoding[""chunks""])
# >>> (50, 50)
# Rechunk the Dataset
ds = ds.chunk({""x"": 25, ""y"": 25})
# The chunksizes have changed
print(ds.my_var.chunks)
# >>> ((25, 25, 25, 25), (25, 25, 25, 25))
# But the encoding value remains the same
print(ds.my_var.encoding[""chunks""])
# >>> (50, 50)
# Attempting to write this back to zarr raises an error
ds.to_zarr(""~/chunk_test_rechunked.zarr"")
# NotImplementedError: Specified zarr chunks encoding['chunks']=(50, 50) for variable named 'my_var' would overlap multiple dask chunks ((25, 25, 25, 25), (25, 25, 25, 25)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.16.3-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.7
libnetcdf: 4.8.1
xarray: 2023.7.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.5.8
pydap: None
h5netcdf: 0.12.0
h5py: 3.6.0
Nio: None
zarr: 2.14.1
cftime: 1.5.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.6
dask: 2022.01.0+dfsg
distributed: 2022.01.0+ds.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.6.0
pip: 23.2.1
conda: None
pytest: 7.2.2
mypy: 1.1.1
IPython: 7.31.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8062/reactions"", ""total_count"": 2, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue
1845508562,I_kwDOAMm_X85uADnS,8065,.mfdataset fail to open a kerchunked zarr file from an object-store bucket ,22492773,closed,0,,,4,2023-08-10T16:22:05Z,2023-08-14T14:18:17Z,2023-08-14T14:13:58Z,NONE,,,,"### What happened?
Trying to open a kerchunk .json through the open_mfdata a ValueError is raised.
### What did you expect to happen?
should be open a Dataset as described here below:
```
Dimensions: (lat: 15680, lon: 40320, time: 36)
Coordinates:
* lat (lat) float64 80.0 79.99 79.98 79.97 ... -59.97 -59.98 -59.99
* lon (lon) float64 -180.0 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
* time (time) float64 nan 1.0 2.0 3.0 4.0 5.0 ... 31.0 32.0 33.0 34.0 35.0
Data variables:
crs object ...
max (time, lat, lon) float32 dask.array
mean (time, lat, lon) float32 dask.array
median (time, lat, lon) float32 dask.array
min (time, lat, lon) float32 dask.array
nobs (time, lat, lon) float32 dask.array
stdev (time, lat, lon) float32 dask.array
Attributes: (12/19)
Conventions: CF-1.6
archive_facility: VITO
copyright: Copernicus Service information 2021
history: 2021-03-01 - Processing line NDVI LTS
identifier: urn:cgls:global:ndvi_stats_all:NDVI-LTS_1999-2019-0...
institution: VITO NV
... ...
references: https://land.copernicus.eu/global/products/ndvi
sensor: VEGETATION-1, VEGETATION-2, VEGETATION
source: Derived from EO satellite imagery
time_coverage_end: 2019-12-31T23:59:59Z
time_coverage_start: 1999-01-01T00:00:00Z
title: Normalized Difference Vegetation Index: Long Term S...
```
### Minimal Complete Verifiable Example
```python
import xarray as xr
catalogue=""https://object-store.cloud.muni.cz/swift/v1/foss4g-catalogue/c_gls_NDVI-LTS_1999-2019.json""
LTS = xr.open_mfdataset(
""reference://"", engine=""zarr"",
backend_kwargs={
""storage_options"": {
""fo"":catalogue
},
""consolidated"": False
}
)
```
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
`ValueError: Cannot specify both fs and storage_options`
```
### Anything else we need to know?
Seems to be related to zarr's version: if tested with <= 2.12 it works but with the latest versions > 2.12 it doesn't.
### Environment
xarray version 2023.7.0
zarr >2.12
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8065/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1817880272,I_kwDOAMm_X85sWqbQ,8013,np.cumproduct deprecated,25102059,closed,0,,,4,2023-07-24T08:11:01Z,2023-07-31T16:46:00Z,2023-07-31T16:46:00Z,CONTRIBUTOR,,,,"### What is your issue?
Since numpy version 1.25.0 `np.cumproduct` is deprecated in favor of `np.cumprod`.
The coordinates to_index() method still uses it
https://github.com/pydata/xarray/blob/971be103d6376d6572d1f12d32526f12f07ae2c7/xarray/core/coordinates.py#L144
which results in an unecessary DeprecationWarning.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8013/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1789989152,I_kwDOAMm_X85qsREg,7962,Better chunk manager error,2448579,closed,0,,,4,2023-07-05T17:27:25Z,2023-07-24T22:26:14Z,2023-07-24T22:26:13Z,MEMBER,,,,"### What happened?
I just ran in to this error in an environment without dask.
```
TypeError: Could not find a Chunk Manager which recognises type
```
I think we could easily recommend the user to install a package that provides `dask` by looking at `type(array).__name__`. This would make the message a lot friendlier
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7962/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1752520008,I_kwDOAMm_X85odVVI,7907,"`plot.scatter(hue_style=""discrete"")` does nothing",20118130,closed,0,,,4,2023-06-12T11:21:33Z,2023-07-13T23:17:49Z,2023-07-13T23:17:49Z,CONTRIBUTOR,,,,"### What happened?
I was trying to do a scatterplot of my data with one dimension determining the color. The dimension has only a few values so I used `hue_style=""discrete""` to have a different color for each value. However, the resulting scatterplot has a continuous colorbar, which is the same as when I pass `hue_style=""continuous""`:

### What did you expect to happen?
The colorbar should have discrete colors. I was also expecting the colors to be from the default matplotlib color palette, C0, C1, etc, when there's less than 10 items, like this:

Although the [examples in the documentation](https://docs.xarray.dev/en/stable/user-guide/plotting.html#scatter) show the discrete case also using viridis.
What I was *really* expecting is a plot like one would get by passing `add_colorbar=False, add_legend=True`:

But that may be a bit too automagical.
### Minimal Complete Verifiable Example
```Python
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
x = xr.DataArray(
np.random.default_rng().random((10, 3)),
coords=[
(""idx"", np.linspace(0, 1, 10)),
(""color"", [1, 2, 3]),
]
)
y = x + np.random.default_rng().random(x.shape)
ds = xr.Dataset({
""x"": x,
""y"": y,
})
# the output is the same regardless of hue_style=""discrete"" or ""continuous"" or just leaving it out
ds.plot.scatter(x=""x"", y=""y"", hue=""color"", hue_style=""discrete"", ax=plt.figure().gca())
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
This is the code for the ""expected"" plot:
```python
from matplotlib.colors import ListedColormap
ds.plot.scatter(
x=""x"",
y=""y"",
hue=""color"",
hue_style=""discrete"",
ax=plt.figure().gca(),
# these lines added in addition to the MVCE
cmap=ListedColormap([""C0"", ""C1"", ""C2""]),
vmin=0.5, vmax=3.5,
cbar_kwargs=dict(ticks=ds.color.data),
)
```
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.14.0-1059-oem
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.1.0
pandas: 1.4.3
numpy: 1.23.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 44.0.0
pip: 20.0.2
conda: None
pytest: None
mypy: None
IPython: 8.12.2
sphinx: None
I also tried this on main at 3459e6fa, the behavior is the same.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7907/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1775657305,I_kwDOAMm_X85p1mFZ,7945,engine='cfgrib' no longer an option in xr.open_dataset() but works anyway,74011857,closed,0,,,4,2023-06-26T21:32:01Z,2023-06-27T00:06:27Z,2023-06-26T21:37:05Z,NONE,,,,"### What is your issue?
Looking at the documentation for [xr.open_dataset()](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html), the ""engine"" argument to that function is listed as accepting one of 7 different engines (or None), but the ""cfgrib"" engine is not among them. Looking at older versions of the documentation, I see that ""cfgrib"" was delisted starting with v2023.04.0 (it's still present in [v2023.03.0](https://docs.xarray.dev/en/v2023.03.0/generated/xarray.open_dataset.html)).
In what I think is a related issue, [this tutorial](https://docs.xarray.dev/en/stable/examples/ERA5-GRIB-example.html) on reading in ERA5 GRIB files with the ""engine='cfgrib'"" option on xr.load_dataset() gives a ValueError in documentation versions starting with [v2023.04.0](https://docs.xarray.dev/en/v2023.04.0/examples/ERA5-GRIB-example.html) and going through [v2023.05.0](https://docs.xarray.dev/en/v2023.05.0/examples/ERA5-GRIB-example.html) and ['stable'](https://docs.xarray.dev/en/stable/examples/ERA5-GRIB-example.html) due to the unrecognized engine 'cfgrib', although it seems to have been fixed for [v2023.06.0](https://docs.xarray.dev/en/v2023.06.0/examples/ERA5-GRIB-example.html) and ['latest'](https://docs.xarray.dev/en/latest/examples/ERA5-GRIB-example.html).
Given both of the above, I was surprised to find that using xr.open_dataset() on a GRIB file with engine='cfgrib' does work for me using xarray v2023.05.0. To me it seems that the documentation for xr.open_dataset() should be edited to include the 'cfgrib' option again, but I'd like to get an opinion from someone more familiar with xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7945/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1718143526,I_kwDOAMm_X85maMom,7854, Freezing Issue When Accessing Precipitation Values with xarray,118670091,closed,0,,,4,2023-05-20T11:30:54Z,2023-06-26T15:33:19Z,2023-06-26T15:33:19Z,NONE,,,,"### What is your issue?
I am encountering a freezing issue in my project that utilizes xarray when trying to access precipitation values for a specific longitude-latitude position over a time period. This issue occurs on the slurm system but is not reproduced on my Jupyter Notebook setup. As a result, whenever I attempt to run the project, the job freezes. I would greatly appreciate your assistance in determining the cause of this problem.
Below is a figure showing the result from Jupyer Notebook (this works):
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7854/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1691902604,I_kwDOAMm_X85k2GKM,7805,[FR] add support for rss and rss button to xarray blog,7980381,closed,0,,,4,2023-05-02T07:15:12Z,2023-06-21T21:10:32Z,2023-06-21T21:10:32Z,NONE,,,,"### Is your feature request related to a problem?
A easy way to subscribe to news from xarray blog
### Describe the solution you'd like
A support for publishing news and button to subscribe to rss from blog (along twitter icon etcera)
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7805/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1760733017,I_kwDOAMm_X85o8qdZ,7924,"Migrate from nbsphinx to myst, myst-nb",2448579,open,0,,,4,2023-06-16T14:17:41Z,2023-06-20T22:07:42Z,,MEMBER,,,,"### Is your feature request related to a problem?
I think we should switch to [MyST markdown](https://mystmd.org/) for our docs. I've been using MyST markdown and [MyST-NB](https://myst-nb.readthedocs.io/en/latest/index.html) in docs in other projects and it works quite well.
Advantages:
1. We get HTML reprs in the docs ([example](https://cf-xarray.readthedocs.io/en/latest/selecting.html)) which is a big improvement. (#6620)
2. I think many find markdown a lot easier to write than RST
There's a tool to migrate RST to MyST ([RTD's migration guide](https://docs.readthedocs.io/en/stable/guides/migrate-rest-myst.html)).
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7924/reactions"", ""total_count"": 5, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1722614979,I_kwDOAMm_X85mrQTD,7870,Name collision with Pulsar Timing package 'PINT' ,3092444,closed,0,,,4,2023-05-23T18:54:18Z,2023-05-26T16:19:37Z,2023-05-26T16:19:37Z,CONTRIBUTOR,,,,"### What is your issue?
In the astrophysics community of [pulsar timers](https://en.wikipedia.org/wiki/Pulsar_timing_array), there is an analysis package called `PINT`. PINT is widely used in that community. As you can see on their [github](https://github.com/nanograv/PINT), they have been aware of the name collision and on pip/conda the package is available as `pint-pulsar`. This has not been a problem so far, because most if not all astrophysicists use the great [astropy](https://www.astropy.org/) to keep track of units where necessary.
However, Bayesian modeling through PyMC is becoming more and more popular, meaning that arviz and xarray are now getting installed alongside pint-pulsar, giving obvious issues.
A very simple workaround would be to change line 37 in https://github.com/pydata/xarray/blob/main/xarray/core/pycompat.py to something like:
`except (ImportError, AttributeError):`
This means that `pint-pulsar` would get imported through `mod`), and the `AttributeError` gets caught, and all should be well. It fits the design of duck-typing, since the package doesn't Quack like pint should. Would xarray be willing to accommodate the pulsar timing community this way? As you are all aware, changing the name of a package that is integral in projects with many dependencies is kind of painful.
EDIT: fixed typo","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7870/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1160309381,I_kwDOAMm_X85FKOqF,6335,ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. ,35556811,closed,0,,,4,2022-03-05T10:26:49Z,2023-05-12T14:09:52Z,2022-03-05T10:28:29Z,NONE,,,,"### What is your issue?
ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
but i installed nedCDF4 use pip install netCDF4","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6335/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1517575123,I_kwDOAMm_X85adFvT,7409,Implement `DataArray.to_dask_dataframe()`,44147817,closed,0,,,4,2023-01-03T15:44:11Z,2023-04-28T15:09:31Z,2023-04-28T15:09:31Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem?
It'd be nice to pass from a chunked DataArray to a dask object directly
### Describe the solution you'd like
I think something along these lines should work (although a less convoluted way might exist):
```python
import dask.dataframe as dkd
import xarray as xr
def to_dask(da: xr.DataArray) -> Union[dkd.Series, dkd.DataFrame]:
if da.data.ndim > 2:
raise ValueError(f""Can only convert 1D and 2D DataArrays, found {da.data.ndim} dimensions"")
indexes = [da.get_index(dim) for dim in da.dims]
darr_index = dka.from_array(indexes[0], chunks=da.data.chunks[0])
columns = [da.name] if da.data.ndim == 1 else indexes[1]
ddf = dkd.from_dask_array(da.data, columns=columns)
ddf[indexes[0].name] = darr_index
return ddf.set_index(indexes[0].name).squeeze()
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7409/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1652227927,I_kwDOAMm_X85iev9X,7713,`Variable/IndexVariable` do not accept a tuple for data.,44142765,closed,0,,,4,2023-04-03T14:50:58Z,2023-04-28T14:26:37Z,2023-04-28T14:26:37Z,NONE,,,,"### What happened?
It appears that `Variable` and `IndexVariable` do not accept a tuple for the `data` parameter even though the docstring suggests it should be able to accept `array_like` objects (tuple falls under this type of object, right?).
### What did you expect to happen?
Successful instantiation of a `Variable/IndexVariable` object, but instead a `ValueError` exception is raised.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
xr.Variable(data=(2, 3, 45), dims=""day"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
ValueError: dimensions ('day',) must have the same length as the number of data dimensions, ndim=0
```
### Anything else we need to know?
This error seems to be triggered by the `self._parse_dimensions(dims)` call inside the `Variable` class. This problem does not happen if I use a list. But I find it strange that the `array_like` data specifically needs to be a certain type of object for the call to work. Maybe if it _has_ to be a list then the docstring should reflect that.
### Environment
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:55)
[GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 6.1.21-1-lts
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.2
distributed: 2023.3.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: 0.14.0
flox: None
numpy_groupies: None
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: 1.1.1
IPython: 8.12.0
sphinx: None
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7713/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
575939446,MDU6SXNzdWU1NzU5Mzk0NDY=,3830,"Documentation request: add examples for carrying out ""ncecat"" in xarray",19657652,open,0,,,4,2020-03-05T01:58:17Z,2023-04-13T20:06:20Z,,NONE,,,,"
In climate science, a very common task involves concatenating NetCDF files with identical variables, dimensions, and coordinates along a brand new ""ensemble member"" or ""record"" dimension. With the NetCDF Operators, this is accomplished using [`ncecat`](http://nco.sourceforge.net/nco.html#ncecat-netCDF-Ensemble-Concatenator
).
#### MCVE Code Sample
Currently, it seems the correct way to do this in xarray is with [`xarray.combine_nested`](http://xarray.pydata.org/en/stable/generated/xarray.combine_nested.html) as follows:
```python
import xarray as xr
files = ['member1.nc', 'member2.nc', ...]
ds = xr.open_mfdataset(
files,
combine='nested',
concat_dim='record',
)
```
#### Problem Description
While this works, there does not seem to be any mention of this use case in the [`combine_nested`](http://xarray.pydata.org/en/stable/generated/xarray.combine_nested.html) or [`open_mfdataset`](http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html) docs... and using `combine='nested'` to concatenate along a brand new dimension feels quite unintuitive to me.
It would be nice to have examples in `combine_nested` and/or `open_mfdataset` with this special usage or mention the possibility of creating *brand new* dimensions with `concat_dim`. For example:
```python
In [1]: import xarray as xr
...: datasets = [
...: xr.Dataset({'temp': (('x', 'y'), np.random.rand(10, 20))})
...: for i in range(3)
...: ]
...: xr.combine_nested(datasets, concat_dim='record')
Out[1]:
Dimensions: (record: 3, x: 10, y: 20)
Dimensions without coordinates: record, x, y
Data variables:
temp (record, x, y) float64 0.32 0.4897 0.2659 ... 0.3485 0.0251 0.399
```
#### Output of ``xr.show_versions()``
n/a","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3830/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1659786592,I_kwDOAMm_X85i7lVg,7742,About save char into netcdf ,61818189,closed,0,,,4,2023-04-09T07:49:50Z,2023-04-11T06:36:27Z,2023-04-11T06:36:27Z,NONE,,,,"### What is your issue?
When I want to save char into netcdf, it will produce a new dimension. However I read this netcdf file with xarray, it can't find anything with this dimension.
∫


","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7742/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1419825696,I_kwDOAMm_X85UoNIg,7199,Deprecate cfgrib backend,43316012,closed,0,,,4,2022-10-23T15:09:14Z,2023-03-29T15:19:53Z,2023-03-29T15:19:53Z,COLLABORATOR,,,,"### What is your issue?
Since cfgrib 0.9.9 (04/2021) it comes with its own xarray backend plugin (looks mainly like a copy of our internal version).
We should deprecate our internal plugin.
The deprecation is complicated since we usually bind the minimum version to a minor step, but cfgrib seems to be on 0.9 since 4 years already. Maybye an exception like for netCDF4?
Anyway, if we decide to leave it as it is for now, this ticket is just a reminder to remove it someday :)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7199/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1620573171,I_kwDOAMm_X85gl_vz,7617,The documentation contains some non-descriptive link texts.,51911758,closed,0,,,4,2023-03-13T00:34:09Z,2023-03-27T21:37:21Z,2023-03-27T21:37:20Z,CONTRIBUTOR,,,,"### What is your issue?
I've been going through the docs and noticed some links could be more descriptive.
Here are a few examples with options on how we could rewrite them:
- See the [user guide](https://docs.xarray.dev/en/stable/indexing.html) for more. -> Check out the [indexing section in the user guide](https://docs.xarray.dev/en/stable/indexing.html) for a detailed explanation.
- For more, see [the Xarray documentation](https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment). -> See the [documentation on automatic alignment](https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment) to learn more.
- [This tutorial notebook](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html) also covers alignment and broadcasting (highly recommended)-> You can also check out this [tutorial notebook on alignment and broadcasting](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html) (highly recommended).
- For more see the [user guide](https://docs.xarray.dev/en/stable/plotting.html), the [gallery](https://docs.xarray.dev/en/stable/examples/visualization_gallery.html), and [the tutorial material](https://tutorial.xarray.dev/fundamentals/04.0_plotting.html). -> For more information, check out the following resources:
* The [plotting documentation](https://docs.xarray.dev/en/stable/user-guide/plotting.html) in the user guide.
* The [visualization gallery](https://docs.xarray.dev/en/stable/examples/visualization_gallery.html).
* The [plotting and visualization tutorial materials](https://tutorial.xarray.dev/fundamentals/04.0_plotting.html).
With more specific link texts, you get a clearer idea of what to expect when you click on the link which improves the reading experience. It also makes the links more accessible.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7617/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
928381010,MDU6SXNzdWU5MjgzODEwMTA=,5515,NetCDF: Attempting netcdf-4 operation on netcdf-3 file,20254164,open,0,,,4,2021-06-23T15:23:55Z,2023-03-27T21:07:32Z,,CONTRIBUTOR,,,,"I'm trying to open MODIS .hdf files, but I get the error : `NetCDF: Attempting netcdf-4 operation on netcdf-3 file`. Does anyone knows how to open that files? (https://nsidc.org/data/MOD10C1)
```python
import xarray as xr
xr.open_dataset('MOD10C1.A2000055.061.2020037182124.hdf')
RuntimeError: NetCDF: Attempting netcdf-4 operation on netcdf-3 file
```
I already opened hdf files from another product without any issue... (https://nsidc.org/data/MOD10CM)
Here are two examples, with one that works and the other one that causes the issue: [MODIS.zip](https://github.com/pydata/xarray/files/6703060/MODIS.zip)
Thanks in advance for your help!
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:25:15)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-16-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.0
pandas: 1.1.0
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: 0.9.8.5
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.21.0
matplotlib: 3.2.0
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: None
setuptools: 49.2.0.post20200712
pip: 20.2
conda: None
pytest: 6.0.0
IPython: 7.16.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5515/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1338173609,I_kwDOAMm_X85Pwuip,6914,plt.imshow() vs xarray_dataset.plot.imshow() not rendering correctly | Potential Bug,32569566,closed,0,,,4,2022-08-14T08:40:56Z,2023-03-22T20:46:23Z,2023-03-22T20:46:23Z,NONE,,,,"### What is your issue?
I have 2d data which I want to visualise. The visuals look completely different if I use plt.imshow() vs xarray_dataset.plot.imshow()
There are mainly two issues
- First, the array is flipped. (I think this is manageable but inconsistent)
- Secondly, the plots don't look correct. This can be best illustrated by the figures themselves.
For example this is the xarray code I am using.
```
day_data.plot.imshow(cmap= ""Blues"", vmin =1, vmax = 100)
plt.show()
```
And this is the image that I get.

Secondly, when I use the matplotlib to plot the values.
```
plt.imshow(day_data.values, vmin = 1, vmax = 100, cmap = 'Blues')
plt.show()
```
I get this plot.

Since it is a discharge data I would expect to see the second plot. Can someone tell me what is the issue here?
P.S.
This is how day_data looks like.
```
xarray.DataArray'dis06'y: 950x: 1000
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
time
()
datetime64[ns]
2019-10-24T06:00:00
step
()
timedelta64[ns]
06:00:00
surface
()
float64
0.0
latitude
(y, x)
float64
...
longitude
(y, x)
float64
...
valid_time
()
datetime64[ns]
2019-10-24T12:00:00
Attributes:
GRIB_paramId :
240023
GRIB_dataType :
sfo
GRIB_numberOfPoints :
950000
GRIB_typeOfLevel :
surface
GRIB_stepUnits :
1
GRIB_stepType :
avg
GRIB_gridType :
lambert_azimuthal_equal_area
GRIB_NV :
0
GRIB_cfName :
unknown
GRIB_cfVarName :
dis06
GRIB_gridDefinitionDescription :
Lambert azimuthal equal area projection
GRIB_missingValue :
9999
GRIB_name :
Mean discharge in the last 6 hours
GRIB_shortName :
dis06
GRIB_units :
m**3 s**-1
long_name :
Mean discharge in the last 6 hours
units :
m**3 s**-1
standard_name :
unknown
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6914/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1499473190,I_kwDOAMm_X85ZYCUm,7385,Unexpected NaNs in broadcast,221526,open,0,,,4,2022-12-16T02:42:44Z,2023-03-14T20:43:00Z,,CONTRIBUTOR,,,,"### What happened?
When running the `broadcast` in the sample code, I end up with `nan` in the output when there are not any in the original source array. While I know the construction is really odd (this came from user-submitted code), I'm shocked that it resulted in `nan`s the resulting broadcasted data and honestly assumed MetPy's code was doing something dumb for quite awhile. I would have expected (regardless of the nature of the coordinates) that the result for `broad_a` be `[[1, 2], [1, 2]]`.
### What did you expect to happen?
_No response_
### Minimal Complete Verifiable Example
```Python
levs = np.array([100000, 85000])
a = xr.Dataset({'a': (('lev',), [1, 2])}, coords={'lev': levs}).to_array()
b = xr.Dataset({'b': (('lev',), [3, 4])}, coords={'lev': levs}).to_array()
broad_a, broad_b = xr.broadcast(a, b)
print(broad_a)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
array([[ 1., 2.],
[nan, nan]])
Coordinates:
* lev (lev) int64 100000 85000
* variable (variable) object 'a' 'b'
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.12.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.5
dask: 2022.6.1
distributed: 2022.6.1
matplotlib: 3.6.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
mypy: 0.991
IPython: 8.7.0
sphinx: 5.3.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7385/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
706507153,MDU6SXNzdWU3MDY1MDcxNTM=,4449,Did copy(deep=True) break with 0.16.1?,6249613,closed,0,,,4,2020-09-22T15:59:41Z,2023-03-12T21:08:42Z,2023-03-12T21:08:42Z,NONE,,,,"
**What happened**: I have a script that downloads a file, reads and copies it to memory with `ds.copy(deep=True)`, and then removes the downloaded file from disk. In 0.16.1, I get an error ""No such file or directory"" when I try to read the data from the deep-copied Dataset as if the Dataset was not actually copied into memory.
**What you expected to happen**: In 0.16.0 and earlier, the variable data is available (`ds.varName.data`) after it is copied into memory even after the original file was removed. But this doesn't work anymore in 0.16.1.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
import os
import urllib.request
# Get sample NetCDF file
url = 'https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc'
FILE = 'tos_O1_2001-2002.nc'
urllib.request.urlretrieve(url, FILE)
# Open the NetCDF file
ds1 = xr.open_dataset(FILE)
# Make a copy of the Dataset
ds2 = ds1.copy(deep=True)
# and close the original
ds1.close()
# remove the NetCDF file
os.remove(FILE)
# Read the copied dataset
ds2
```
**Anything else we need to know?**:
Output for xarray v0.16.0

Output for xarray v0.16.1
```FileNotFoundError: [Errno 2] No such file or directory: ...tos_O1_2001-2002.nc'```
**Environment**:
Output of xr.show_versions() for xarray 0.16.0
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.0
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: None
IPython: 7.18.1
sphinx: None
Output of xr.show_versions() for xarray 0.16.1
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: None
IPython: 7.18.1
sphinx: Nonexarray: 0.16.0
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: None
IPython: 7.18.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4449/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1598266728,I_kwDOAMm_X85fQ51o,7556,broken documentation link,76110149,closed,0,,,4,2023-02-24T09:37:57Z,2023-03-12T18:02:59Z,2023-03-12T18:02:59Z,CONTRIBUTOR,,,,"### What is your issue?
Hi,
I found [this broken link](https://docs.xarray.dev/en/stable/user-guide/datetime_component_indexing) at the bottom of the [Datetime Indexing](https://docs.xarray.dev/en/stable/user-guide/time-series.html#datetime-indexing) subsection in the User Guide.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7556/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1468838643,I_kwDOAMm_X85XjLLz,7336,Instability when calculating standard deviation,26401994,closed,0,,,4,2022-11-29T23:33:55Z,2023-03-10T20:32:51Z,2023-03-10T20:32:50Z,NONE,,,,"### What happened?
I noticed that for some large values (not really that large) and lots of samples, the ```data.std()``` yields different values than ```np.std(data)```. This seems to be related to the magnitude. See attached code here:
```
nino34_tas_picontrol_detrend = nino34_tas_picontrol-298
std_dev = nino34_tas_picontrol_detrend.std()
print(std_dev.data)
std_dev = nino34_tas_picontrol.std()
print(std_dev.data)
nino34_tas_picontrol_detrend = nino34_tas_picontrol-10
std_dev = nino34_tas_picontrol_detrend.std()
print(std_dev.data)
```
and the results are:
```
1.4448999166488647
24.911161422729492
20.054718017578125
```

So I guess this is related to the magnitude, but not sure. Anyone has similar issue?
### What did you expect to happen?
Adding or subtracting a constant should not change the standard deviation.
See screenshot here about what the data look like:

### Minimal Complete Verifiable Example
_No response_
### MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.71.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.6.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7336/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1588461863,I_kwDOAMm_X85ergEn,7539,Concat doesn't concatenate dimension coordinates along new dims,35968931,open,0,,,4,2023-02-16T22:32:33Z,2023-02-21T19:07:48Z,,MEMBER,,,,"### What is your issue?
`xr.concat` doesn't concatenate dimension coordinates along new dimensions, which leads to pretty unintuitive behavior.
Take this example (motivated by https://github.com/pydata/xarray/discussions/7532#discussioncomment-4988792)
```python
segments = []
for i in range(2):
time = np.sort(np.random.random(4))
da = xr.DataArray(
np.random.randn(4,2),
dims=[""time"", ""cols""],
coords=dict(time=('time', time), cols=[""col1"", ""col2""]),
)
segments.append(da)
```
```python
In [86]: segments
Out[86]:
[
array([[-0.61199576, -0.9012078 ],
[-0.54187577, 1.30509994],
[-3.53720471, 0.97607797],
[ 0.2593455 , 0.95920031]])
Coordinates:
* time (time) float64 0.1048 0.168 0.869 0.9432
* cols (cols)
array([[ 0.90266408, -0.54294821],
[-1.09087103, -0.17484417],
[-0.21679558, -0.57377412],
[ 0.07570151, 0.27433728]])
Coordinates:
* time (time) float64 0.03627 0.09754 0.2434 0.592
* cols (cols)
array([[[ nan, nan],
[ nan, nan],
[-0.61199576, -0.9012078 ],
[-0.54187577, 1.30509994],
[ nan, nan],
[ nan, nan],
[-3.53720471, 0.97607797],
[ 0.2593455 , 0.95920031]],
[[ 0.90266408, -0.54294821],
[-1.09087103, -0.17484417],
[ nan, nan],
[ nan, nan],
[-0.21679558, -0.57377412],
[ 0.07570151, 0.27433728],
[ nan, nan],
[ nan, nan]]])
Coordinates:
* time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432
* cols (cols) 46 assert np.isclose(r, 1.0), r
AssertionError: 0.2664911388214005
```
```
### Anything else we need to know?
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
Xarray version is '2022.9.0'
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-193.28.1.el8_2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.11.0
distributed: None
matplotlib: 3.6.2
cartopy: None
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7340/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue