issues
416 rows where comments = 4 and type = "issue" sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: locked, milestone, author_association, state_reason, created_at (date), updated_at (date), closed_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2276352251 | I_kwDOAMm_X86HrmD7 | 8994 | Improving performance of open_datatree | TomNicholas 35968931 | open | 0 | 4 | 2024-05-02T19:43:17Z | 2024-05-03T15:25:33Z | MEMBER | What is your issue?The implementation of We discussed this in the datatree meeting, and my understanding is that concretely we need to:
It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on? |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2163608564 | I_kwDOAMm_X86A9gv0 | 8802 | Error when using `apply_ufunc` with `datetime64` as output dtype | gcaria 44147817 | open | 0 | 4 | 2024-03-01T15:09:57Z | 2024-05-03T12:19:14Z | CONTRIBUTOR | What happened?When using What did you expect to happen?No response Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray: return time[:10] def fn(da: xr.DataArray) -> xr.DataArray: dim_out = "time_cp"
da_fake = xr.DataArray(np.random.rand(5,5,5), coords=dict(x=range(5), y=range(5), time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]') )).chunk(dict(x=2,y=2)) fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas fn(da_fake).compute() # same errors as above ``` MVCE confirmation
Relevant log output```PythonValueError Traceback (most recent call last) Cell In[211], line 1 ----> 1 fn(da_fake).compute() File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, kwargs) 1144 """Manually trigger loading of this array's data from disk or a 1145 remote source into memory and return a new array. The original is 1146 left unaltered. (...) 1160 dask.compute 1161 """ 1162 new = self.copy(deep=False) -> 1163 return new.load(kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, kwargs) 1119 def load(self, kwargs) -> Self: 1120 """Manually trigger loading of this array's data from disk or a 1121 remote source into memory and return this array. 1122 (...) 1135 dask.compute 1136 """ -> 1137 ds = self._to_temp_dataset().load(**kwargs) 1138 new = self._from_temp_dataset(ds) 1139 self._variable = new._variable File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, kwargs) 850 chunkmanager = get_chunked_array_type(lazy_data.values()) 852 # evaluate all the chunked arrays simultaneously --> 853 evaluated_data = chunkmanager.compute(lazy_data.values(), kwargs) 855 for k, data in zip(lazy_data, evaluated_data): 856 self.variables[k].data = data File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, data, kwargs) 67 def compute(self, data: DaskArray, kwargs) -> tuple[np.ndarray, ...]: 68 from dask.array import compute ---> 70 return compute(*data, kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 625 postcomputes.append(x.dask_postcompute()) 627 with shorten_traceback(): --> 628 results = schedule(dsk, keys, kwargs) 630 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.call(self, args, kwargs) 2369 self._init_stage_2(args, kwargs) 2370 return self -> 2372 return self._call_as_normal(*args, kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, args, *kwargs) 2362 vargs = [args[_i] for _i in inds] 2363 vargs.extend([kwargs[_n] for _n in names]) -> 2365 return self._vectorize_call(func=func, args=vargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args)
2444 """Vectorized call to File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args) 2502 outputs = _create_arrays(broadcast_shape, dim_sizes, 2503 output_core_dims, otypes, results) 2505 for output, result in zip(outputs, results): -> 2506 output[index] = result 2508 if outputs is None: 2509 # did not call the function even once 2510 if otypes is None: ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas ``` Anything else we need to know?No response Environment |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8802/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2270275688 | I_kwDOAMm_X86HUaho | 8985 | update `to_netcdf` docstring to list support for explicit CDF5 writes | JulioTBacmeister 9221710 | open | 0 | 4 | 2024-04-30T00:41:13Z | 2024-04-30T20:48:46Z | NONE | Is your feature request related to a problem?I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command. Describe the solution you'd likeWhen I write a netcdf file using: D.to_netcdf( filename ) then ask ncdump to tell me the kind of file I have, ncdump -k filename it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command: nccopy -k cdf5 filename cdf5_filename the file now works in CAM. Also, the command ncdump -k cdf5_filename returns 'cdf5'. I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command. Describe alternatives you've consideredWriting netcdf-4 files from xarray and converting via nccopy -k cdf5 filename cdf5_filename Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8985/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1389295853 | I_kwDOAMm_X85Szvjt | 7099 | Pass arbitrary options to sel() | benbovy 4160723 | open | 0 | 4 | 2022-09-28T12:44:52Z | 2024-04-30T00:44:18Z | MEMBER | Is your feature request related to a problem?Currently It would be also useful for custom indexes to expose their own selection options, e.g.,
From #3223, it would be nice if we could also pass distinct options values per index. What would be a good API for that? Describe the solution you'd likeSome ideas: A. Allow passing a tuple
B. Expose an
Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great. Any other ideas? Some sort of context manager? Some Describe alternatives you've consideredThe API proposed in #3223 would look great if Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7099/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 481761508 | MDU6SXNzdWU0ODE3NjE1MDg= | 3223 | Feature request for multiple tolerance values when using nearest method and sel() | NicWayand 1117224 | open | 0 | 4 | 2019-08-16T19:53:31Z | 2024-04-29T23:21:04Z | NONE | ```python import xarray as xr import numpy as np import pandas as pd Create test datads = xr.Dataset() ds.coords['lon'] = np.arange(-120,-60) ds.coords['lat'] = np.arange(30,50) ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30') ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time']) target_lat = [36.83] target_lon = [-110] target_time = [np.datetime64('2019-06-01')] Nearest pulls a date too far awayds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest') Adding tolerance for lat long, but also applied to timeds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5) Ideally tolerance could accept a dictionary but currently failsds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')}) ``` Expected OutputA dataset with nearest values to tolerances on each dim. Problem DescriptionI would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed. Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3223/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2259316341 | I_kwDOAMm_X86Gqm51 | 8965 | Support concurrent loading of variables | dcherian 2448579 | open | 0 | 4 | 2024-04-23T16:41:24Z | 2024-04-29T22:21:51Z | MEMBER | Is your feature request related to a problem?Today if users have to concurrently load multiple variables in a DataArray or Dataset, they have to use dask. It struck me that it'd be pretty easy for |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8965/reactions",
"total_count": 3,
"+1": 3,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1250939008 | I_kwDOAMm_X85Kj9CA | 6646 | `dim` vs `dims` | max-sixty 5635139 | closed | 0 | 4 | 2022-05-27T16:15:02Z | 2024-04-29T18:24:56Z | 2024-04-29T18:24:56Z | MEMBER | What is your issue?I've recently been hit with this when experimenting with Should we standardize on one of these? |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6646/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1024011835 | I_kwDOAMm_X849CS47 | 5857 | Incorrect results when using xarray.ufuncs.angle(..., deg=True) | cvr 1119116 | closed | 0 | 4 | 2021-10-12T16:24:11Z | 2024-04-28T20:58:55Z | 2024-04-28T20:58:54Z | NONE | What happened: The What you expected to happen: To have the result of Minimal Complete Verifiable Example: ```python Put your MCVE code hereimport numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd)) D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D.values%360} instead of {ds.wd.values}" \ + f"\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!") D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!") D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!") ``` Anything else we need to know?: Though ```python import numpy as np import xarray as xr ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))}) Z = np.exp(1j * np.radians(ds.wd)) print(Z) print(f"Is Z an XArray? {isinstance(Z, xr.DataArray)}") D = np.angle(ds.wd, deg=True)
print(D)
print(f"Is D an XArray? {isinstance(D, xr.DataArray)}")
Environment: No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost). Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-18-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.utf8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: 4.10.3 pytest: None IPython: None sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5857/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2224036575 | I_kwDOAMm_X86EkBrf | 8905 | Variable doesn't have an .expand_dims method | TomNicholas 35968931 | closed | 0 | 4 | 2024-04-03T22:19:10Z | 2024-04-28T19:54:08Z | 2024-04-28T19:54:08Z | MEMBER | Is your feature request related to a problem?
Describe the solution you'd likeVariable should also have this method, the only difference being that it wouldn't create any coordinates or indexes. Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8905/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 590630281 | MDU6SXNzdWU1OTA2MzAyODE= | 3921 | issues discovered by the all-but-dask CI | keewis 14808389 | closed | 0 | 4 | 2020-03-30T22:08:46Z | 2024-04-25T14:48:15Z | 2024-02-10T02:57:34Z | MEMBER | After adding the |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3921/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2243685081 | I_kwDOAMm_X86Fu-rZ | 8945 | netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory | brendan-m-murphy 11130776 | closed | 0 | 4 | 2024-04-15T13:26:08Z | 2024-04-23T21:49:28Z | 2024-04-23T15:33:36Z | NONE | What is your issue?Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory). Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300 fp = xr.Dataset({"fp": (["time", "lat", "lon"], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={"time": pd.date_range(start="2019-01-01T02:00:00", periods=times, freq="1H"), "lat": np.arange(nlat), "lon": np.arange(nlon)}) flux = xr.Dataset({"flux": (["time", "lat", "lon"], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={"time": [pd.to_datetime("2019-01-01")], "lat": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), "lon": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)}) fp.to_netcdf("combine_datasets_tests/fp.nc") flux.to_netcdf("combine_datasets_tests/flux.nc") fp1 = xr.open_dataset("combine_datasets_tests/fp.nc") flux1 = xr.open_dataset("combine_datasets_tests/flux.nc") ``` Then
Profiling the "reindex without load" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 72656 0.109 0.000 0.109 0.000 utils.py:429(<lambda>) 72656 0.085 0.000 0.136 0.000 utils.py:430(<lambda>) 72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 145318 0.048 0.000 0.115 0.000 shape_base.py:370(<genexpr>) 2 0.045 0.023 0.046 0.023 indexing.py:1334(getitem) 6 0.044 0.007 0.044 0.007 numeric.py:136(ones) 145318 0.044 0.000 0.067 0.000 index_tricks.py:690(next) 14 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next} 1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 1 0.000 0.000 0.000 0.000 file_manager.py:226(close) ``` The In my venv, netCDF4 was installed from a wheel with the following versions:
This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3. I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.) |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8945/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1664193419 | I_kwDOAMm_X85jMZOL | 7748 | diff('non existing dimension') does not raise exception | LunarLanding 4441338 | open | 0 | 4 | 2023-04-12T09:29:58Z | 2024-04-21T22:31:37Z | NONE | What happened?Calling xr.DataArray.diff with a non-existing dimension does not raise an exception. What did you expect to happen?An exception to be raised. Minimal Complete Verifiable Example
MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.0-21-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.6.9
numpy_groupies: 0.9.20
setuptools: 67.6.0
pip: 23.0.1
conda: 23.1.0
pytest: 7.2.2
mypy: 1.1.1
IPython: 8.11.0
sphinx: 6.1.3
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7748/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2237228079 | I_kwDOAMm_X86FWWQv | 8927 | Use a neutral format to have lossless interface with JSON, scipp, Astropy, pandas | loco-philippe 92333742 | open | 0 | 4 | 2024-04-11T08:50:34Z | 2024-04-12T14:25:35Z | NONE | Is your feature request related to a problem?Each tool has a specific structure for processing multidimensional data with the following consequences:
Describe the solution you'd likeThe proposed format (see jupyter notebook, github repository, PyPI package ) is based on the following principles:
Describe alternatives you've consideredNo response Additional contexthttps://github.com/numpy/numpy/issues/12481#issuecomment-2049179803 https://github.com/astropy/astropy/issues/16286 https://github.com/scipp/scipp/issues/3422 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8927/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1959816045 | I_kwDOAMm_X8500Gtt | 8368 | to_netcdf: Unexpected drop of "units" attribute of attached "bounds" | leonfoks 15173535 | open | 0 | 4 | 2023-10-24T18:15:05Z | 2024-04-09T11:11:20Z | NONE | What happened?When writing a Dataset to netcdf, any DataArrays that are linked as bounds through another variables attrs['bounds'] entry, have their (specifically) 'units' attribute dropped inside the written netcdf file. See example What did you expect to happen?Units attribute to be written to the netcdf file. Minimal Complete Verifiable Example```Python import numpy as np import xarray as xr Create a new Datasetds = xr.Dataset() Add the x variable, Specify 'x_bnds' as bounds, defined later.ds['x'] = xr.DataArray(np.arange(10), dims='x', attrs={'units':'m', 'bounds':'x_bnds'}) Bounds require an extra dimension equal to number of vertices.ds['nv'] = xr.DataArray(np.r_[0, 1], dims='nv') Add the actual bounding values for variable x.ds['x_bnds'] = xr.DataArray(np.squeeze(np.dstack([np.arange(10)-0.5, np.arange(10)+0.5])), print('Units is attached to the bounds in the dataset before writing', 'units' in ds['x_bnds'].attrs) Write to netcdf fileds.to_netcdf('tmp.nc', format='netcdf4', engine='netcdf4') Open the dataset and check x_bnds attrs. units is dropped.new = xr.open_dataset('tmp.nc') print(new['x_bnds'].attrs) Confirm that units were never written to the file.!h5dump -d /x_bnds tmp.nc ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.3
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: 7.2.6
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8368/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2230680765 | I_kwDOAMm_X86E9Xy9 | 8919 | Using the xarray.Dataset.where() function takes up a lot of memory | isLiYang 69391863 | closed | 0 | 4 | 2024-04-08T09:15:49Z | 2024-04-09T02:45:09Z | 2024-04-09T02:45:08Z | NONE | What is your issue?My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function. The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable ds takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal. ``` Open this netcdf file.ds = xr.open_dataset(track) If longitude range is [-180, 180], then convert to [0, 360].if np.any(ds[var_lon] < 0): ds[var_lon] = ds[var_lon] % 360 Extract data by longitude and latitude.ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) & (ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3])) Select data by range and value of some variables.for key, value in range_select.items(): ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1])) for key, value in value_select.items(): ds = ds.where(ds[key].isin(value)) ``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8919/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2228373305 | I_kwDOAMm_X86E0kc5 | 8915 | Weird behavior of DataSet.where(... , drop=True) | johannespletzer 22961670 | closed | 0 | 4 | 2024-04-05T16:03:05Z | 2024-04-08T09:32:48Z | 2024-04-08T09:32:48Z | NONE | What happened?I work with an aircraft emission dataset that is freely available online: emission dataset During my calculations I eventually convert the Example 1: Along some dimensions data points vanished if Example 2: For other dimensions (these?) data points appeared elsewhere if What did you expect to happen?I expect for my calculations to return the same results, regardless of whether drop=True is active or not. Minimal Complete Verifiable Example```Python !wget "https://zenodo.org/records/10818082/files/Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc" import matplotlib.pyplot as plt import xarray as xr nc_file = xr.open_dataset('Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc') fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lon','time')).plot.contour(x='lat',ax=axs[0]) axs[0].set_xlim(-50,90) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lon','time')).plot.contour(x='lat',ax=axs[1]) axs[1].set_xlim(-50,90) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() fig, axs = plt.subplots(1,2,figsize=(10,4)) nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lat','time')).plot.contour(x='lon',ax=axs[0]) axs[0].set_title('With drop=True') nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lat','time')).plot.contour(x='lon',ax=axs[1]) axs[1].set_title('With drop=False') plt.tight_layout() plt.show() ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'ISO8859-1')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2022.11.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: 3.7.0
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 22.3.1
conda: None
pytest: None
IPython: 8.10.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8915/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2206243581 | I_kwDOAMm_X86DgJr9 | 8876 | Possible race condition when appending to an existing zarr | rsemlal-murmuration 157591329 | closed | 0 | 4 | 2024-03-25T16:59:52Z | 2024-04-03T15:23:14Z | 2024-03-29T14:35:52Z | NONE | What happened?When appending to an existing zarr along a dimension ( What did you expect to happen?We would expected that zarr append to have the same behaviour as if we concatenate dataset in memory (using Minimal Complete Verifiable Example```Python from distributed import Client, LocalCluster import xarray as xr import tempfile ds1 = xr.Dataset({"a": ("x", [1., 1.])}, coords={'x': [1, 2]}).chunk({"x": 3}) ds2 = xr.Dataset({"a": ("x", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({"x": 3}) with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode="w") # write first dataset ds2.to_zarr(store, mode="a", append_dim="x") # append first dataset
``` MVCE confirmation
Relevant log output
Anything else we need to know?The example code snippet provided here, reproduces the issue. Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs. In the example, when Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is silently changed made this issue harder to spot. However, when we try to append the second dataset Zarr chunks:
+ chunk1 : Dask chunks for Both dask chunks A and B, are supposed to write on zarr chunk3
And depending on who writes first, we can end up with NaN on The issue obviously happens only when dask tasks are run in parallel.
Using We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?) Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8876/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2211106929 | I_kwDOAMm_X86DytBx | 8882 | to_zarr silently loses data when using append_dim, if chunks are different to zarr store | harryC-space-intelligence 140395181 | closed | 0 | 4 | 2024-03-27T15:27:02Z | 2024-03-29T14:35:51Z | 2024-03-29T14:35:51Z | NONE | What happened?When writing a chunked DataArray to an existing zarr store, appending along an existing dimension of the store, I have found that some data are not written if there are multiple array chunks to one zarr chunk. I appreciate it is probably bad practice to have different chunksizes in my DataArray and zarr_store, but I think its a realistic scenario that needs to be caught. This may be related to / the same underlying issue as #8371. Perhaps the checks mentioned in https://github.com/pydata/xarray/issues/8371#issuecomment-1814589157 are somehow getting bypassed? Using zarr's ThreadSynchronizer is the only way I have found to ensure that all the data gets written. What did you expect to happen?I expected that either
Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np from matplotlib import pyplot as plt x_coords = np.arange(10) y_coords = np.arange(10) t_coords = np.array([np.datetime64('2020-01-01').astype('datetime64[ns]')]) data = np.ones((10,10)) for i in range(4): plt.subplot(1,4,i+1)
``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?Output from the plots above: Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-1041-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.2.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.12.0
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: 2.17.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.3.1
distributed: 2024.3.1
matplotlib: 3.8.3
cartopy: 0.22.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: 0.15.1
flox: 0.9.5
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: 24.1.2
pytest: 8.1.1
mypy: None
IPython: 8.22.2
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8882/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 935607748 | MDU6SXNzdWU5MzU2MDc3NDg= | 5563 | Decoding non-utf-8 encoded strings with the h5netcdf engine | kiksekage 11391714 | closed | 0 | 4 | 2021-07-02T09:49:58Z | 2024-03-26T15:08:41Z | 2024-03-26T15:08:41Z | NONE | What happened:
Trying to load a netCDF file-like ( What you expected to happen:
Loading the same file, albeit persisted to disk, with the Traceback: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 242, in load_dataset with open_dataset(filename_or_obj, **kwargs) as ds: File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/api.py", line 496, in open_dataset backend_ds = backend.open_dataset( File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 384, in open_dataset ds = store_entrypoint.open_dataset( File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/store.py", line 22, in open_dataset vars, attrs = store.load() File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/common.py", line 126, in load attributes = FrozenDict(self.get_attrs()) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf_.py", line 234, in get_attrs return FrozenDict(read_attributes(self.ds)) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 75, in read_attributes v = maybe_decode_bytes(v) File "/home/thns/.venv/cmems/lib/python3.8/site-packages/xarray/backends/h5netcdf.py", line 63, in maybe_decode_bytes return txt.decode("utf-8") Minimal Complete Verifiable Example: ```python import xarray as xr import netCDF4 title = b'\xc3' f = netCDF4.Dataset('test.nc', 'w') f.title = title f.close() xr.load_dataset("test.nc", engine="h5netcdf") ``` Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.0 (default, Feb 25 2021, 22:10:10) [GCC 8.4.0] python-bits: 64 OS: Linux OS-release: 4.15.0-136-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.1 pandas: 1.2.4 numpy: 1.20.3 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.0.0 pip: 21.1.3 conda: None pytest: 6.2.4 IPython: 7.25.0 sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5563/reactions",
"total_count": 2,
"+1": 2,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2117248281 | I_kwDOAMm_X85-MqUZ | 8704 | Currently no way to create a Coordinates object without indexes for 1D variables | TomNicholas 35968931 | closed | 0 | 4 | 2024-02-04T18:30:18Z | 2024-03-26T13:50:16Z | 2024-03-26T13:50:15Z | MEMBER | What happened?The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on What did you expect to happen?I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e.
Minimal Complete Verifiable Example```Python class UnindexableArrayAPI: ... class UnindexableArray: """ Presents like an N-dimensional array but doesn't support changes of any kind, nor can it be coerced into a np.ndarray or pd.Index. """
``` ```python uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32')) xr.Variable(data=uarr, dims=['x']) # works fine xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0
NotImplementedError Traceback (most recent call last) Cell In[59], line 1 ----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={}) File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.init(self, coords, indexes) 299 variables = {} 300 for name, data in coords.items(): --> 301 var = as_variable(data, name=name) 302 if var.dims == (name,) and indexes is None: 303 index, index_vars = create_default_index_implicit(var, list(coords)) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name) 152 raise TypeError( 153 f"Variable {name!r}: unable to convert object into a variable without an " 154 f"explicit list of dimensions: {obj!r}" 155 ) 157 if name is not None and name in obj.dims and obj.ndim == 1: 158 # automatically convert the Variable into an Index --> 159 obj = obj.to_index_variable() 161 return obj File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self) 570 def to_index_variable(self) -> IndexVariable: 571 """Return this variable as an xarray.IndexVariable""" --> 572 return IndexVariable( 573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True 574 ) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.init(self, dims, data, attrs, encoding, fastpath) 2640 # Unlike in Variable, always eagerly load values into memory 2641 if not isinstance(self._data, PandasIndexingAdapter): -> 2642 self._data = PandasIndexingAdapter(self._data) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.init(self, array, dtype) 1478 def init(self, array: pd.Index, dtype: DTypeLike = None): 1479 from xarray.core.indexes import safe_cast_to_index -> 1481 self.array = safe_cast_to_index(array) 1483 if dtype is None: 1484 self._dtype = get_valid_numpy_dtype(array) File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
459 emit_user_level_warning(
460 (
461 " Cell In[55], line 63, in UnindexableArray.array(self) 62 def array(self) -> np.ndarray: ---> 63 raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects") NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?Context is #8699 EnvironmentVersions described above |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8704/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 957918751 | MDU6SXNzdWU5NTc5MTg3NTE= | 5664 | Interpolation behaviour inconsistent with numpy? | mathisc 7017525 | open | 0 | 4 | 2021-08-02T08:56:28Z | 2024-03-12T01:15:46Z | NONE | Hey all,
When running Here is the sample code to reproduce the issue : ```python def test_crop_times_nan() : ds = xr.Dataset( data_vars = { "some_variable" : (['x', 'time'], np.array([[np.nan, 0, 1]])) }, coords = { "time" : np.array([0,1,2]) } ) result = ds.interp(time=ds.time)
Is that an intended behaviour for xarray?
If so, does this mean that I first have to check if an interpolation is needed instead of doing it no matter what (and use Thanks for your help ;) Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-7642-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.19.4 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: None cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.0 distributed: 2021.01.0 matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: None IPython: 7.19.0 sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5664/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2140090923 | I_kwDOAMm_X85_jzIr | 8759 | Passing datasets with different group hierarchy to open_mfdataset | KareemShalabi 111437410 | closed | 0 | 4 | 2024-02-17T13:31:18Z | 2024-03-03T18:43:09Z | 2024-03-03T10:53:34Z | NONE | Is your feature request related to a problem?When you want to open multiple datasets located at different nodes of group hierarchy in HDF file, you can't pass a list of group keys ( save_mfdataset offers 'groups' keyword; emphasis on the s). Add to that, the 'files' keyword argument does not accept 'datastore' as a valid input. Describe the solution you'd likeNo response Describe alternatives you've consideredOne, of course, can open_dataset each one in a loop and combine afterwards. One possible fix is to Modify the 'group' argument to accept a list the same length as paths list. Another could be changing "paths" keyword to accept datastore or h5py objects. Both are trivial in my opinion. Most of the code is already there in other functions (open_dataset, save_mfdataset). Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8759/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2141899767 | I_kwDOAMm_X85_qsv3 | 8769 | Errors started appearing after release v2024.02.0 | navidcy 7112768 | closed | 0 | 4 | 2024-02-19T09:23:16Z | 2024-02-22T04:54:06Z | 2024-02-22T04:54:06Z | NONE | What happened?I started seeing errors in my CI after latest xarray release. See, e.g., https://github.com/COSIMA/regional-mom6/actions/runs/7957078139/job/21719091616#step:7:226 After I added a compat for xarray to preclude the latest release the error went away. See: https://github.com/COSIMA/regional-mom6/actions/runs/7957192738 What did you expect to happen?No response Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8769/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2142982259 | I_kwDOAMm_X85_u1Bz | 8771 | Unable to use Xarray to work on RCM Dataset with xsar and safe_rcm by umr-lops | sparshgarg23 34626942 | closed | 0 | 4 | 2024-02-19T18:58:50Z | 2024-02-20T05:29:33Z | 2024-02-20T05:29:33Z | NONE | What happened?UMR-LOPS has introduced XSAR a library to work with RCM dataset.
when working with the following code
14 frames /usr/local/lib/python3.10/dist-packages/xsar/utils.py in wrapper(args, kwargs) 93 startrss = process.memory_info().rss 94 starttime = time.time() ---> 95 result = f(args, **kwargs) 96 endtime = time.time() 97 if mem_monitor: /usr/local/lib/python3.10/dist-packages/xsar/rcm_meta.py in init(self, name) 32 self.dt = api.open_rcm(name.split(':')[1]) 33 else: ---> 34 self.dt = api.open_rcm(name) 35 if not name.startswith('RCM_DS:'): 36 name = 'RCM_DS:%s:' % name /usr/local/lib/python3.10/dist-packages/safe_rcm/api.py in open_rcm(url, backend_kwargs, manifest_ignores, **dataset_kwargs) 95 ) 96 ---> 97 tree = read_product(mapper, "metadata/product.xml") 98 99 calibration_root = "metadata/calibration" /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in read_product(mapper, product_path) 272 } 273 --> 274 converted = valmap( 275 lambda x: execute(**x)(decoded), 276 layout, /usr/local/lib/python3.10/dist-packages/toolz/dicttoolz.py in valmap(func, d, factory) 83 """ 84 rv = factory() ---> 85 rv.update(zip(d.keys(), map(func, d.values()))) 86 return rv 87 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in <lambda>(x) 273 274 converted = valmap( --> 275 lambda x: execute(**x)(decoded), 276 layout, 277 ) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call(self, args, kwargs) 302 def call(self, args, kwargs): 303 try: --> 304 return self._partial(*args, kwargs) 305 except TypeError as exc: 306 if self._should_curry(args, kwargs, exc): /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in execute(mapping, f, path) 29 subset = query(path, mapping) 30 ---> 31 return compose_left(f, attach_path(path=path))(subset) 32 33 /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call(self, args, kwargs) 485 486 def call(self, args, kwargs): --> 487 ret = self.first(*args, kwargs) 488 for f in self.funcs: 489 ret = f(ret) /usr/local/lib/python3.10/dist-packages/toolz/functoolz.py in call(self, args, kwargs) 487 ret = self.first(args, **kwargs) 488 for f in self.funcs: --> 489 ret = f(ret) 490 return ret 491 /usr/local/lib/python3.10/dist-packages/safe_rcm/product/reader.py in <lambda>(obj) 126 ), 127 lambda obj: obj.set_index({"stacked": ["pole", "pulse"]}), --> 128 lambda obj: obj.unstack("stacked"), 129 ), 130 }, /usr/local/lib/python3.10/dist-packages/xarray/util/deprecation_helpers.py in inner(args, kwargs) 113 return func(args[:-n_extra_args], kwargs) 114 --> 115 return func(*args, kwargs) 116 117 return inner /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse) 5576 ) 5577 else: -> 5578 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) 5579 return result 5580 /usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py in _unstack_once(self, dim, index_and_vars, fill_value, sparse) 5395 indexes = {k: v for k, v in self._indexes.items() if k != dim} 5396 -> 5397 new_indexes, clean_index = index.unstack() 5398 indexes.update(new_indexes) 5399 /usr/local/lib/python3.10/dist-packages/xarray/core/indexes.py in unstack(self)
1019
1020 if not clean_index.is_unique:
-> 1021 raise ValueError(
1022 "Cannot unstack MultiIndex containing duplicates. Make sure entries "
1023 f"are unique, e.g., by calling ValueError: Cannot unstack MultiIndex containing duplicates. Make sure entries are unique, e.g., by calling What did you expect to happen?the error shouldn't be there,and I should be able to view the dataframe. as shown in below link https://cyclobs.ifremer.fr/static/sarwing_datarmor/xsar/examples/rcm.html Minimal Complete Verifiable Example
MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
commit: None
python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.1.58+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2023.7.0
pandas: 1.5.3
numpy: 1.25.2
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: 1.3.0
h5py: 3.9.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.1
distributed: 2023.8.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.13.1
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.4.4
mypy: None
IPython: 7.34.0
sphinx: 5.0.2
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8771/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1912094632 | I_kwDOAMm_X85x-D-o | 8231 | xr.concat concatenates along dimensions that it wasn't asked to | TomNicholas 35968931 | open | 0 | 4 | 2023-09-25T18:50:29Z | 2024-02-14T20:30:26Z | MEMBER | What happened?Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists). ```python import xarray as xr ds1 = xr.Dataset(
coords={
'x_center': ('x_center', [1, 2, 3]),
'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]), ds2 = xr.Dataset(
coords={
'x_center': ('x_center', [4, 5, 6]),
'x_outer': ('x_outer', [4.5, 5.5, 6.5]), Calling What did you expect to happen?I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. What I expected to happen was that (as by default ```python import xarray as xr ds1 = xr.Dataset(
data_vars={
'a': ('x_center', [1, 2, 3]),
'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]), ds2 = xr.Dataset(
data_vars={
'a': ('x_center', [4, 5, 6]),
'b': ('x_outer', [4.5, 5.5, 6.5]), Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?I was trying to create an example for which you would need the automatic combined concat/merge that happens within Environmentxarray |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8231/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1390228572 | I_kwDOAMm_X85S3TRc | 7104 | Duplicate values on unstack | znichollscr 114576287 | closed | 0 | 4 | 2022-09-29T04:16:26Z | 2024-02-13T09:48:37Z | 2024-02-13T09:48:37Z | NONE | What happened?I unstacked a dataset and got values I didn't expect. It turns out that, when unstacking, my dataset had multiple values for the same index. This is clearly a case of user error, but it silently passed. What did you expect to happen?A warning or error would be raised to say, "this isn't going to work". Minimal Complete Verifiable Example```Python import datetime as dt import xarray as xr ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], dims=("lat", "time"), coords={"lat": [-60, 60], "time": [dt.datetime(2010, 1, d) for d in range(1, 4)]}, name="test", ).to_dataset() ds = ( ds.assign_coords( { "month": ds["time"].dt.month, "year": ds["time"].dt.year, } ) .set_index(time=["month", "year"]) ) ds = ds.unstack("time") the output only has 2 values, which isn't what I expectedds["test"].data ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?It's not clear to me where the error is. It might just be that this particular order of operations leads to a case that isn't otherwise caught. Looking at intermediate output, I thought the error was in unstack but maybe it's more complex than that... Environment
INSTALLED VERSIONS
------------------
commit: e678a1d7884a3c24dba22d41b2eef5d7fe5258e7
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7104/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2126375172 | I_kwDOAMm_X85-vekE | 8726 | PRs requiring approval & merging main? | max-sixty 5635139 | closed | 0 | 4 | 2024-02-09T02:35:58Z | 2024-02-09T18:23:52Z | 2024-02-09T18:21:59Z | MEMBER | What is your issue?Sorry I haven't been on the calls at all recently (unfortunately the schedule is difficult for me). Maybe this was discussed there? PRs now seem to require a separate approval prior to merging. Is there an upside to this? Is there any difference between those who can approve and those who can merge? Otherwise it just seems like more clicking. PRs also now seem to require merging the latest main prior to merging? I get there's some theoretical value to this, because changes can semantically conflict with each other. But it's extremely rare that this actually happens (can we point to cases?), and it limits the immediacy & throughput of PRs. If the bad outcome does ever happen, we find out quickly when main tests fail and can revert. (fwiw I wrote a few principles around this down a while ago here; those are much stronger than what I'm suggesting in this issue though) |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8726/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2115049090 | I_kwDOAMm_X85-ERaC | 8694 | Error while saving an altered dataset to NetCDF when loaded from a file | tarik 12544636 | open | 0 | 4 | 2024-02-02T14:18:03Z | 2024-02-07T13:38:40Z | NONE | What happened?When attempting to save an altered Xarray dataset to a NetCDF file using the What did you expect to happen?The altered Xarray dataset is saved as a NetCDF file using the Minimal Complete Verifiable Example```Python import xarray as xr ds = xr.Dataset( data_vars=dict( win_1=("attempt", [True, False, True, False, False, True]), win_2=("attempt", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=("attempt", ["paper", "paper", "scissors", "scissors", "paper", "paper"]), player_2=("attempt", ["rock", "scissors", "paper", "rock", "paper", "rock"]), ) ) ds.to_netcdf("dataset.nc") ds_from_file = xr.load_dataset("dataset.nc") ds_altered = ds_from_file.where(ds_from_file["player_1"] == "paper", drop=True) ds_altered.to_netcdf("dataset_altered.nc") ``` MVCE confirmation
Relevant log output
Anything else we need to know?Findings: The issue is related to the encoding information of the dataset becoming invalid after filtering data with the In the provided examples, the maximum length of strings stored in "player_1" and "player_2" is originally set to 8 characters. However, after filtering with the Workaround: A workaround to resolve this issue is to call the Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.14 (main, Aug 24 2023, 14:01:46)
[GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.3.1-060301-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2024.1.1
pandas: 2.2.0
numpy: 1.26.3
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 23.3.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8694/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 782440858 | MDU6SXNzdWU3ODI0NDA4NTg= | 4784 | Opening a tiff with scale_factor/add_offset attrs then saving as zarr and opening causes a UFuncTypeError | ohiat 53100696 | closed | 0 | 4 | 2021-01-08T22:45:21Z | 2024-02-06T10:40:15Z | 2024-02-06T10:40:14Z | NONE | What happened:
When opening a geotiff that has UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('<U32') to dtype('float32') with casting rule 'same_kind'
Minimal Complete Verifiable Example:
Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-1034-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.0 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.6.1 cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.8 cfgrib: None iris: None bottleneck: None dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.3 conda: None pytest: 6.2.1 IPython: 7.19.0 sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/4784/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2112742578 | I_kwDOAMm_X8597eSy | 8693 | reading netcdf with engine=scipy fails with a typeerror under certain conditions | eivindjahren 32731672 | open | 0 | 4 | 2024-02-01T15:03:23Z | 2024-02-05T09:35:51Z | CONTRIBUTOR | What happened?Saving and loading from netcdf with engine=scipy produces an unexpected valueerror on read. The file seems to be corrupted. What did you expect to happen?reading works just fine. Minimal Complete Verifiable Example```Python import numpy as np import xarray as xr ds = xr.Dataset( { "values": ( ["name", "time"], np.array([[]], dtype=np.float32).T, ) }, coords={"time": [1], "name": []}, ).expand_dims({"index": [0]}) ds.to_netcdf("file.nc", engine="scipy") _ = xr.open_dataset("file.nc", engine="scipy") ``` MVCE confirmation
Relevant log output```Python KeyError Traceback (most recent call last) File .../python3.11/site-packages/xarray/backends/file_manag er.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError: File .../python3.11/site-packages/xarray/backends/lru_cache. py:56, in LRUCache.getitem(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key) KeyError: [<function _open_scipy_netcdf at 0x7fe96afa9120>, ('/home/eivind/Projects/ert/file.nc',), 'r', (('mmap', None), ('version', 2)), '264ec6b3-78b3-4766-bb41-7656d6a51962'] During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[1], line 18 4 ds = ( 5 xr.Dataset( 6 { (...) 15 .expand_dims({"index": [0]}) 16 ) 17 ds.to_netcdf("file.nc", engine="scipy") ---> 18 _ = xr.open_dataset("file.nc", engine="scipy") File .../python3.11/site-packages/xarray/backends/api.py:572 , in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, d ecode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked _array_type, from_array_kwargs, backend_kwargs, kwargs) 560 decoders = _resolve_decoders_kwargs( 561 decode_cf, 562 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 568 decode_coords=decode_coords, 569 ) 571 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 572 backend_ds = backend.open_dataset( 573 filename_or_obj, 574 drop_variables=drop_variables, 575 decoders, 576 kwargs, 577 ) 578 ds = _dataset_from_backend_dataset( 579 backend_ds, 580 filename_or_obj, (...) 590 kwargs, 591 ) 592 return ds File .../python3.11/site-packages/xarray/backends/scipy_.py: 315, in ScipyBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, con cat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, mode, format, group, mm ap, lock) 313 store_entrypoint = StoreBackendEntrypoint() 314 with close_on_error(store): --> 315 ds = store_entrypoint.open_dataset( 316 store, 317 mask_and_scale=mask_and_scale, 318 decode_times=decode_times, 319 concat_characters=concat_characters, 320 decode_coords=decode_coords, 321 drop_variables=drop_variables, 322 use_cftime=use_cftime, 323 decode_timedelta=decode_timedelta, 324 ) 325 return ds File .../python3.11/site-packages/xarray/backends/store.py:4 3, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, conca t_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 29 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 30 self, 31 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 39 decode_timedelta=None, 40 ) -> Dataset: 41 assert isinstance(filename_or_obj, AbstractDataStore) ---> 43 vars, attrs = filename_or_obj.load() 44 encoding = filename_or_obj.get_encoding() 46 vars, attrs, coord_names = conventions.decode_cf_variables( 47 vars, 48 attrs, (...) 55 decode_timedelta=decode_timedelta, 56 ) File .../python3.11/site-packages/xarray/backends/common.py: 210, in AbstractDataStore.load(self) 188 def load(self): 189 """ 190 This loads the variables and attributes simultaneously. 191 A centralized loading function makes it easier to create (...) 207 are requested, so care should be taken to make sure its fast. 208 """ 209 variables = FrozenDict( --> 210 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 211 ) 212 attributes = FrozenDict(self.get_attrs()) 213 return variables, attributes File .../python3.11/site-packages/xarray/backends/scipy_.py: 181, in ScipyDataStore.get_variables(self) 179 def get_variables(self): 180 return FrozenDict( --> 181 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 182 ) File .../python3.11/site-packages/xarray/backends/scipy_.py: 170, in ScipyDataStore.ds(self) 168 @property 169 def ds(self): --> 170 return self._manager.acquire() File .../python3.11/site-packages/xarray/backends/file_manag
er.py:193, in CachingFileManager.acquire(self, needs_lock)
178 def acquire(self, needs_lock=True):
179 """Acquire a file object from the manager.
180
181 A new file is only opened if it has expired from the
(...)
191 An open file object, as returned by File .../python3.11/site-packages/xarray/backends/file_manag er.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args, *kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File .../python3.11/site-packages/xarray/backends/scipy_.py: 109, in _open_scipy_netcdf(filename, mode, mmap, version) 106 filename = io.BytesIO(filename) 108 try: --> 109 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 110 except TypeError as e: # netcdf3 message is obscure in this case 111 errmsg = e.args[0] File .../python3.11/site-packages/scipy/io/_netcdf.py:278, i n netcdf_file.init(self, filename, mode, mmap, version, maskandscale) 275 self._attributes = {} 277 if mode in 'ra': --> 278 self._read() File .../python3.11/site-packages/scipy/io/_netcdf.py:607, i n netcdf_file._read(self) 605 self._read_dim_array() 606 self._read_gatt_array() --> 607 self._read_var_array() File .../python3.11/site-packages/scipy/io/netcdf.py:688, i n netcdf_file._read_var_array(self) 685 data = None 686 else: # not a record variable 687 # Calculate size to avoid problems with vsize (above) --> 688 a_size = reduce(mul, shape, 1) * size 689 if self.use_mmap: 690 data = self._mm_buf[begin:begin_+a_size].view(dtype=dtype_) TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.1.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: 0.13.1
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.4.3
pip: 23.3.1
conda: None
pytest: 7.4.4
mypy: 1.8.0
IPython: 8.17.2
sphinx: 7.2.6
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8693/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2111051033 | I_kwDOAMm_X8591BUZ | 8691 | xarray.open_dataset with chunks={} returns a single chunk and not engine (h5netcdf) preferred chunks | abarciauskas-bgse 15016780 | closed | 0 | 4 | 2024-01-31T22:04:02Z | 2024-01-31T22:56:17Z | 2024-01-31T22:56:17Z | NONE | What happened?When opening MUR SST netcdfs from S3, xarray.open_dataset(file, engine="h5netcdf", chunks={}) returns a single chunk (whereas the h5netcdf library returns a chunk shape of (1, 1023, 2047). A notebook version of the code below includes the output: https://gist.github.com/abarciauskas-bgse/9366e04d2af09b79c9de466f6c1d3b90 What did you expect to happen?I thought the chunks={} option would return the same chunks (1, 1023, 2047) exposed by the h5netcdf engine. Minimal Complete Verifiable Example```Python !/usr/bin/env pythoncoding: utf-8This notebook looks at how xarray and h5netcdf return different chunks.import pandas as pd import h5netcdf import s3fs import xarray as xr dates = [ d.to_pydatetime().strftime('%Y%m%d') for d in pd.date_range('2023-02-01', '2023-03-01', freq='D') ] SHORT_NAME = 'MUR-JPL-L4-GLOB-v4.1' s3_fs = s3fs.S3FileSystem(anon=False) var = 'analysed_sst' def make_filename(time): base_url = f's3://podaac-ops-cumulus-protected/{SHORT_NAME}/' # example file: "/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc" return f'{base_url}{time}090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' s3_urls = [make_filename(d) for d in dates] def print_chunk_shape(s3_url): try: # Open the dataset using xarray file = s3_fs.open(s3_url) dataset = xr.open_dataset(file, engine='h5netcdf', chunks={})
[print_chunk_shape(s3_url) for s3_url in s3_urls] ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.198-187.748.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.6.1
distributed: 2023.6.1
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.0.0
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.14.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8691/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2104267494 | I_kwDOAMm_X859bJLm | 8677 | Add rolling.rank() same as pandas | Mirac-Le 39230130 | open | 0 | 4 | 2024-01-28T17:27:21Z | 2024-01-29T19:50:20Z | NONE | Is your feature request related to a problem?Dear xarray maintainers, I would like to express my heartfelt gratitude for the significant optimizations your xarray library has brought to my project. Xarray combines the speed of numpy with the highly customizable parameters of pandas. The extensive parameters in the I am wondering if it would be possible to incorporate a ranking method for rolling windows, including the ability to specify parameters such as Once again, thank you for your contributions!
Describe the solution you'd likeNo response Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8677/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1716228662 | I_kwDOAMm_X85mS5I2 | 7848 | Compatibility with the Array API standard | TomNicholas 35968931 | open | 0 | 4 | 2023-05-18T20:34:43Z | 2024-01-25T04:03:42Z | MEMBER | What is your issue?Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other. We've already had - #6804 - #7067 - #7847 and there will likely be many others. I suspect this might require changes to the standard as well as to xarray - in particular see this list of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ):
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7848/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2079089277 | I_kwDOAMm_X8577GJ9 | 8607 | allow computing just a small number of variables | keewis 14808389 | open | 0 | 4 | 2024-01-12T15:21:27Z | 2024-01-12T20:20:29Z | MEMBER | Is your feature request related to a problem?I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that. Describe the solution you'd likeI'd imagine something like
Describe alternatives you've consideredSo far I've been using something like
Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8607/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2073024461 | I_kwDOAMm_X857j9fN | 8602 | `DataArray.mean()` and `Dataset.mean()` fail with `sparse==0.15.0` | martinkim0 46072231 | closed | 0 | 4 | 2024-01-09T19:27:47Z | 2024-01-10T14:44:57Z | 2024-01-10T14:44:57Z | NONE | What happened?The following script leads to an error: ``` import numpy as np import xarray as xr from sparse import GCXS x = np.random.negative_binomial(1, 0.5, size=(100, 100)) array = xr.DataArray(GCXS.from_numpy(x)) array.mean() ``` ```AttributeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 array.mean() File ~/.../python3.11/site-packages/xarray/core/_aggregations.py:1663, in DataArrayAggregations.mean(self, dim, skipna, keep_attrs, kwargs)
1588 def mean(
1589 self,
1590 dim: Dims = None,
(...)
1594 kwargs: Any,
1595 ) -> Self:
1596 """
1597 Reduce this DataArray's data by applying File ~/.../python3.11/site-packages/xarray/core/dataarray.py:3776, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs)
3732 def reduce(
3733 self,
3734 func: Callable[..., Any],
(...)
3740 kwargs: Any,
3741 ) -> Self:
3742 """Reduce this array by applying File ~/.../python3.11/site-packages/xarray/core/variable.py:1756, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs) 1749 keep_attrs_ = ( 1750 _get_keep_attrs(default=False) if keep_attrs is None else keep_attrs 1751 ) 1753 # Noe that the call order for Variable.mean is 1754 # Variable.mean -> NamedArray.mean -> Variable.reduce 1755 # -> NamedArray.reduce -> 1756 result = super().reduce( 1757 func=func, dim=dim, axis=axis, keepdims=keepdims, kwargs 1758 ) 1760 # return Variable always to support IndexVariable 1761 return Variable( 1762 result.dims, result.data, attrs=result._attrs if keep_attrs else None 1763 ) File ~/.../python3.11/site-packages/xarray/namedarray/core.py:772, in NamedArray.reduce(self, func, dim, axis, keepdims, kwargs) 770 data = func(self.data, axis=axis, kwargs) 771 else: --> 772 data = func(self.data, **kwargs) 774 if getattr(data, "shape", ()) == self.shape: 775 dims = self.dims File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:637, in mean(array, axis, skipna, kwargs) 635 return _to_pytimedelta(mean_timedeltas, unit="us") + offset 636 else: --> 637 return _mean(array, axis=axis, skipna=skipna, kwargs) File ~/.../python3.11/site-packages/xarray/core/duck_array_ops.py:399, in _create_nan_agg_method.<locals>.f(values, axis, skipna, **kwargs) 396 kwargs.pop("min_count", None) 398 xp = get_array_namespace(values) --> 399 func = getattr(xp, name) 401 try: 402 with warnings.catch_warnings(): AttributeError: module 'sparse' has no attribute 'mean' ``` What did you expect to happen?Reproducible script runs without error with Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: None
xarray: 2023.12.0
pandas: 1.5.3
numpy: 1.24.4
scipy: 1.11.4
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.12.0
distributed: 2023.12.0
matplotlib: 3.8.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.12.0
cupy: None
pint: None
sparse: 0.15.0
flox: None
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.18.1
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8602/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2041076267 | I_kwDOAMm_X855qFor | 8551 | Make _obj_repr public | BENR0 12115839 | closed | 0 | 4 | 2023-12-14T07:19:16Z | 2023-12-21T16:00:52Z | 2023-12-21T16:00:52Z | NONE | What is your issue?We are using https://github.com/pydata/xarray/blob/2971994ef1dd67f44fe59e846c62b47e1e5b240b/xarray/core/formatting_html.py#L278 in the html representation of |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8551/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 2027147099 | I_kwDOAMm_X854089b | 8523 | tree-reduce the combine for `open_mfdataset(..., parallel=True, combine="nested")` | dcherian 2448579 | open | 0 | 4 | 2023-12-05T21:24:51Z | 2023-12-18T19:32:39Z | MEMBER | Is your feature request related to a problem?When Instead we can tree-reduce the combine (example) by switching to
cc @TomNicholas |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8523/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1223031600 | I_kwDOAMm_X85I5fsw | 6561 | Excessive memory consumption by to_dataframe() | sgdecker 8419421 | closed | 0 | 4 | 2022-05-02T15:33:33Z | 2023-12-15T20:47:32Z | 2023-12-15T20:47:32Z | NONE | What happened?This is a reincarnation of #2534 with a reproduceable example. A 51 MB netCDF file leads to to_dataframe() requesting 23 GB. What did you expect to happen?I expect to_dataframe() to require much less than 23 GB of memory for this operation. Minimal Complete Verifiable Example```Python import urllib.request import xarray as xr url = 'http://people.envsci.rutgers.edu/decker/Surface_METAR_20220501_0000.nc' fname = 'metar.nc' urllib.request.urlretrieve(url, filename=fname) ncdata = xr.open_dataset(fname) df = ncdata.to_dataframe() ``` MVCE confirmation
Relevant log output
Anything else we need to know?No response Environment
/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.62.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.3
scipy: None
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: None
IPython: None
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6561/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 384002323 | MDU6SXNzdWUzODQwMDIzMjM= | 2570 | np.clip() executes eagerly | Hoeze 1200058 | closed | 0 | 4 | 2018-11-24T16:25:03Z | 2023-12-03T05:29:17Z | 2023-12-03T05:29:17Z | NONE | Example:
Problem descriptionUsing np.clip() directly calculates the result, while xr.DataArray.clip() does not. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2570/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1902108672 | I_kwDOAMm_X85xX-AA | 8207 | Getting `NETCDF: HDF error` while writing a NetCDF file opened using `open_mfdataset` | kasra-keshavarz 50383939 | open | 0 | 4 | 2023-09-19T02:44:02Z | 2023-12-01T22:29:49Z | NONE | What is your issue?I am simply reading 366 small (~15MBs) NetCDF files to create one big NetCDF file at the end. Below is the relevant workflow: ```python-console In [1]: import os; import dask In [2]: import xarray as xr In [3]: from dask.distributed import Client, LocalCluster In [4]: cluster = LocalCluster(n_workers=4, threads_per_worker=1) # 1 core to each worker In [5]: client = Client(cluster) In [6]: os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE' In [7]: ds = xr.open_mfdataset('./remapped/*.nc', chunks={'COMID': 1400}, parallel=True) In [8]: ds.to_netcdf('./out2.nc') ``` And below, is the error I am getting: Error message```python-console In [8]: ds.to_netcdf('./out2.nc') /home/kasra545/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3149: UserWarning: Sending large graph of size 9.97 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures. warnings.warn( 2023-09-18 22:26:14,279 - distributed.worker - WARNING - Compute Failed Key: ('open_dataset-concatenate-concatenate-be7dd534c459e2f316d9149df2d9ec95', 178, 0) Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyIndexedArray(array=_ElementwiseFunctionArray(LazilyIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x2b863b0e94c0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x2b86218d4ee0>, encoded_fill_values={-9999.0}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 24, None), slice(0, 1400, None))) kwargs: {} Exception: "RuntimeError('NetCDF: HDF error')" --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 ds.to_netcdf('./out2.nc') File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/dataset.py:2252, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2249 encoding = {} 2250 from xarray.backends.api import to_netcdf -> 2252 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2253 self, 2254 path, 2255 mode=mode, 2256 format=format, 2257 group=group, 2258 engine=engine, 2259 encoding=encoding, 2260 unlimited_dims=unlimited_dims, 2261 compute=compute, 2262 multifile=False, 2263 invalid_netcdf=invalid_netcdf, 2264 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/api.py:1255, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1252 if multifile: 1253 return writer, store -> 1255 writes = writer.sync(compute=compute) 1257 if isinstance(target, BytesIO): 1258 store.sync() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/common.py:256, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs) 253 if chunkmanager_store_kwargs is None: 254 chunkmanager_store_kwargs = {} --> 256 delayed_store = chunkmanager.store( 257 self.sources, 258 self.targets, 259 lock=self.lock, 260 compute=compute, 261 flush=True, 262 regions=self.regions, 263 **chunkmanager_store_kwargs, 264 ) 265 self.sources = [] 266 self.targets = [] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/daskmanager.py:211, in DaskManager.store(self, sources, targets, **kwargs) 203 def store( 204 self, 205 sources: DaskArray | Sequence[DaskArray], 206 targets: Any, 207 **kwargs, 208 ): 209 from dask.array import store --> 211 return store( 212 sources=sources, 213 targets=targets, 214 **kwargs, 215 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/array/core.py:1236, in store(***failed resolving arguments***) 1234 elif compute: 1235 store_dsk = HighLevelGraph(layers, dependencies) -> 1236 compute_as_if_collection(Array, store_dsk, map_keys, **kwargs) 1237 return None 1239 else: File ~/virtual-envs/meshflow/lib/python3.10/site-packages/dask/base.py:369, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs) 367 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get) 368 dsk2 = optimization_function(cls)(dsk, keys, **kwargs) --> 369 return schedule(dsk2, keys, **kwargs) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:3267, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 3265 should_rejoin = False 3266 try: -> 3267 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 3268 finally: 3269 for f in futures.values(): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/distributed/client.py:2393, in Client.gather(self, futures, errors, direct, asynchronous) 2390 local_worker = None 2392 with shorten_traceback(): -> 2393 return self.sync( 2394 self._gather, 2395 futures, 2396 errors=errors, 2397 direct=direct, 2398 local_worker=local_worker, 2399 asynchronous=asynchronous, 2400 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:484, in __array__() 483 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray: --> 484 return np.asarray(self.get_duck_array(), dtype=dtype) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:487, in get_duck_array() 486 def get_duck_array(self): --> 487 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:664, in get_duck_array() 663 def get_duck_array(self): --> 664 return self.array.get_duck_array() File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:557, in get_duck_array() 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): --> 557 array = array.get_duck_array() 558 return _wrap_numpy_scalars(array) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/coding/variables.py:74, in get_duck_array() 73 def get_duck_array(self): ---> 74 return self.func(self.array.get_duck_array()) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:551, in get_duck_array() 550 def get_duck_array(self): --> 551 array = self.array[self.key] 552 # self.array[self.key] is now a numpy array when 553 # self.array is a BackendArray subclass 554 # and self.key is BasicIndexer((slice(None, None, None),)) 555 # so we need the explicit check for ExplicitlyIndexed 556 if isinstance(array, ExplicitlyIndexed): File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:100, in __getitem__() 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/core/indexing.py:858, in explicit_indexing_adapter() 836 """Support explicit indexing by delegating to a raw indexing method. 837 838 Outer and/or vectorized indexers are supported by indexing a second time (...) 855 Indexing result, in the form of a duck numpy-array. 856 """ 857 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 858 result = raw_indexing_method(raw_key.tuple) 859 if numpy_indices.tuple: 860 # index the loaded np.ndarray 861 result = NumpyIndexingAdapter(result)[numpy_indices] File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:112, in _getitem() 110 try: 111 with self.datastore.lock: --> 112 original_array = self.get_array(needs_lock=False) 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:91, in get_array() 90 def get_array(self, needs_lock=True): ---> 91 ds = self.datastore._acquire(needs_lock) 92 variable = ds.variables[self.variable_name] 93 variable.set_auto_maskandscale(False) File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:403, in _acquire() 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds File /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/contextlib.py:135, in __enter__() 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context() 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file File ~/virtual-envs/meshflow/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info() 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(*self._args, **kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a" File src/netCDF4/_netCDF4.pyx:2487, in netCDF4._netCDF4.Dataset.__init__() File src/netCDF4/_netCDF4.pyx:1928, in netCDF4._netCDF4._get_vars() File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```The header of individual NetCDF ones are also in the following: Individual NetCDF header```console $ ncdump -h ab_models_remapped_1980-04-20-13-00-00.nc netcdf ab_models_remapped_1980-04-20-13-00-00 { dimensions: COMID = 14980 ; time = UNLIMITED ; // (24 currently) variables: int time(time) ; time:long_name = "time" ; time:units = "hours since 1980-04-20 12:00:00" ; time:calendar = "gregorian" ; time:standard_name = "time" ; time:axis = "T" ; double latitude(COMID) ; latitude:long_name = "latitude" ; latitude:units = "degrees_north" ; latitude:standard_name = "latitude" ; double longitude(COMID) ; longitude:long_name = "longitude" ; longitude:units = "degrees_east" ; longitude:standard_name = "longitude" ; double COMID(COMID) ; COMID:long_name = "shape ID" ; COMID:units = "1" ; double RDRS_v2.1_P_P0_SFC(time, COMID) ; RDRS_v2.1_P_P0_SFC:_FillValue = -9999. ; RDRS_v2.1_P_P0_SFC:long_name = "Forecast: Surface pressure" ; RDRS_v2.1_P_P0_SFC:units = "mb" ; double RDRS_v2.1_P_HU_1.5m(time, COMID) ; RDRS_v2.1_P_HU_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_HU_1.5m:long_name = "Forecast: Specific humidity" ; RDRS_v2.1_P_HU_1.5m:units = "kg kg**-1" ; double RDRS_v2.1_P_TT_1.5m(time, COMID) ; RDRS_v2.1_P_TT_1.5m:_FillValue = -9999. ; RDRS_v2.1_P_TT_1.5m:long_name = "Forecast: Air temperature" ; RDRS_v2.1_P_TT_1.5m:units = "deg_C" ; double RDRS_v2.1_P_UVC_10m(time, COMID) ; RDRS_v2.1_P_UVC_10m:_FillValue = -9999. ; RDRS_v2.1_P_UVC_10m:long_name = "Forecast: Wind Modulus (derived using UU and VV)" ; RDRS_v2.1_P_UVC_10m:units = "kts" ; double RDRS_v2.1_A_PR0_SFC(time, COMID) ; RDRS_v2.1_A_PR0_SFC:_FillValue = -9999. ; RDRS_v2.1_A_PR0_SFC:long_name = "Analysis: Quantity of precipitation" ; RDRS_v2.1_A_PR0_SFC:units = "m" ; double RDRS_v2.1_P_FB_SFC(time, COMID) ; RDRS_v2.1_P_FB_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FB_SFC:long_name = "Forecast: Downward solar flux" ; RDRS_v2.1_P_FB_SFC:units = "W m**-2" ; double RDRS_v2.1_P_FI_SFC(time, COMID) ; RDRS_v2.1_P_FI_SFC:_FillValue = -9999. ; RDRS_v2.1_P_FI_SFC:long_name = "Forecast: Surface incoming infrared flux" ; RDRS_v2.1_P_FI_SFC:units = "W m**-2" ; ```I am running Currently Loaded Modules: 1) CCconfig 6) ucx/1.8.0 11) netcdf-mpi/4.9.0 (io) 16) freexl/1.0.5 (t) 21) scipy-stack/2023a (math) 2) gentoo/2020 (S) 7) libfabric/1.10.1 12) hdf5-mpi/1.12.1 (io) 17) geos/3.10.2 (geo) 22) libspatialindex/1.8.5 (phys) 3) gcccore/.9.3.0 (H) 8) openmpi/4.0.3 (m) 13) libffi/3.3 18) librttopo-proj9/1.1.0 23) ipykernel/2023a 4) imkl/2020.1.217 (math) 9) StdEnv/2020 (S) 14) python/3.10.2 (t) 19) proj/9.0.1 (geo) 24) sqlite/3.38.5 5) intel/2020.1.217 (t) 10) mii/1.1.2 15) mpi4py/3.1.3 (t) 20) libspatialite-proj901/5.0.1 ``` Any suggestion is greatly appreciated! |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8207/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 2019789753 | I_kwDOAMm_X854Y4u5 | 8499 | 'drop_duplicates' behaves differently when using 1 vs many coordinates for an index | jbweston 6654709 | open | 0 | 4 | 2023-12-01T00:36:42Z | 2023-12-01T09:55:39Z | NONE | What happened?I am trying to To accomplish this, I call 'DataArray.set_xindex' with the appropriate coordinate names, and then call 'drop_duplicates' on the resulting DataArray, like so: ```python from xarray import DataArray import numpy as np test_array = DataArray( np.random.rand(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", ) output DataArray's 'sample' dimension has length 2, as expectedgood = test_array.set_xindex(["x", "y"]).drop_duplicates("sample") assert len(good) == 2 ``` The above functions as expected; 'good' has had its duplicates dropped, and we are left with a DataArray of length 2. However, the following does not function as I would expect: ```python All the 'y's are '-1', so we expect the same duplicates as before to be dropped,even if we don't include the 'y' values in the index.bad = test_array.set_xindex("x").drop_duplicates("sample") But this assert fails! 'drop_duplicates' does not drop anythingassert not bad.equals(test_array) ``` What did you expect to happen?I expected Minimal Complete Verifiable Example```Python from xarray import DataArray import numpy as np test_array = DataArray( range(5), coords=dict(x=("sample", [1, 2, 1, 2, 1]), y=("sample", [-1] * 5)), dims="sample", ) output DataArray's 'sample' dimension has length 2, as expectedgood = test_array.set_xindex(["x", "y"]).drop_duplicates("sample") And indeed there are only 2 elements left after dropping duplicates.assert len(good) == 2 All the 'y's are '-1', so we expect the same duplicates as before to be dropped,bad = test_array.drop_vars("y").set_xindex("x").drop_duplicates("sample") But this assert fails! 'drop_duplicates' does not drop anythingassert not bad.equals(test_array.drop_vars("y")) ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.11.0
pandas: 2.1.0
numpy: 1.24.4
scipy: 1.11.2
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.2.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.9.1
distributed: 2023.9.1
matplotlib: 3.7.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.7.3
pytest: 7.4.2
mypy: None
IPython: 8.15.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8499/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1983891070 | I_kwDOAMm_X852P8Z- | 8427 | Ambiguous behavior with coordinates when appending to Zarr store with append_dim | rabernat 1197350 | closed | 0 | 4 | 2023-11-08T15:40:19Z | 2023-12-01T03:58:56Z | 2023-12-01T03:58:55Z | MEMBER | What happened?There are two quite different scenarios covered by "append" with Zarr
This issue is about what should happen when using Here's the current behavior. ```python import xarray as xr import zarr ds1 = xr.DataArray( np.array([1, 2, 3]).reshape(3, 1, 1), dims=('time', 'y', 'x'), coords={'x': [1], 'y': [2]}, name="foo" ).to_dataset() ds2 = xr.DataArray( np.array([4, 5]).reshape(2, 1, 1), dims=('time', 'y', 'x'), coords={'x':[-1], 'y': [-2]}, name="foo" ).to_dataset() how concat works: data are alignedds_concat = xr.concat([ds1, ds2], dim="time") assert ds_concat.dims == {"time": 5, "y": 2, "x": 2} now do a Zarr appendstore = zarr.storage.MemoryStore() ds1.to_zarr(store, consolidated=False) we do not check that the coordinates are aligned--just that they have the same shape and dtypeds2.to_zarr(store, append_dim="time", consolidated=False) ds_append = xr.open_zarr(store, consolidated=False) coordinates data have been overwrittenassert ds_append.dims == {"time": 5, "y": 1, "x": 1} ...with the latest valuesassert ds_append.x.data[0] == -1 ``` Currently, we always write all data variables in this scenario. That includes overwriting the coordinates every time we append. That makes appending more expensive than it needs to be. I don't think that is the behavior most users want or expect. What did you expect to happen?There are a couple of different options we could consider for how to handle this "extending" situation (with
We currently do 1a. I propose to switch to 1b. I think it is closer to what users want, and it requires less I/O. Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.176-157.645.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.1
pandas: 2.1.2
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.2.0
h5py: 3.10.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.1
distributed: 2023.10.1
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: 0.13.0
numbagg: 0.6.0
fsspec: 2023.10.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.16.1
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8427/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1044693438 | I_kwDOAMm_X84-RMG- | 5937 | DataArray.dt.seconds returns incorrect value for negative `timedelta64[ns]` | leifdenby 2405019 | closed | 0 | 4 | 2021-11-04T12:05:24Z | 2023-11-10T00:39:17Z | 2023-11-10T00:39:17Z | CONTRIBUTOR | What happened: For a negative
What you expected to happen:
Minimal Complete Verifiable Example: ```python coding: utf-8import xarray as xr import numpy as np number of nanosecondsvalue = 42 da = xr.DataArray([np.timedelta64(value, "ns")]) print(da.dt.seconds) assert da.dt.seconds == 0 da = xr.DataArray([np.timedelta64(-value, "ns")]) print(da.dt.seconds) assert da.dt.seconds == 0 ``` Anything else we need to know?: I've narrowed this down to the call to
I think the issue arises because pandas turns the numpy timedelta64 into a "minus one day plus a time". This actually does have a number of "seconds" in it, but the "total_seconds" has the expected value:
Which would correctly round to zero. I don't think the issue is in pandas, although the output from pandas is counter-intuitive:
Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)? Environment: Output of <tt>xr.show_versions()</tt>``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 6 2020, 04:59:01) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.18.2 pandas: 1.3.4 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.4.2 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.09.1 distributed: 2021.09.1 matplotlib: 3.2.2 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None fsspec: 2021.06.1 cupy: None pint: 0.18 sparse: None setuptools: 46.4.0.post20200518 pip: 21.1.2 conda: None pytest: 6.0.1 IPython: 7.16.1 sphinx: None ``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5937/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1981799811 | I_kwDOAMm_X852H92D | 8423 | Support remote string paths for `h5netcdf` engine | jrbourbeau 11656932 | open | 0 | 4 | 2023-11-07T16:52:18Z | 2023-11-09T07:24:45Z | CONTRIBUTOR | Is your feature request related to a problem?Currently the
Describe the solution you'd likeIt would be nice if I could do something like the following:
and have my files opened prior to handing off to Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8423/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1975845455 | I_kwDOAMm_X851xQJP | 8410 | Segmentation fault 139 (SIGSEGV) | lucadix 39524075 | closed | 0 | 4 | 2023-11-03T10:14:03Z | 2023-11-06T20:34:46Z | 2023-11-06T20:34:45Z | NONE | What happened?While opening a set of netCDF files in a for loop, using xr.open_dataset().load(), I get a segmentation error (nr. 139). Please see code example below: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)).load() [other code working on region_pred...]
In this way, KO gets printed and the segmentation fault is now noticeable. I managed to fix the issue by using a second variable (called reg_pred) in addition to region_pred: ``` for region in region_list: [some code to read data associated to each region...] region_pred = xr.open_dataset(io.BytesIO(data)) reg_pred = region_pred.load() [other code working on reg_pred...] ``` What did you expect to happen?I don't know if the issue I described is something that the developers made on purpose. Personally, I think it is an issue and that's why I am reporting it. If it is not an issue, I would like to get a clarification in order to understand what am I missing. Thank you in advance. Minimal Complete Verifiable Example```Python for region in region_list: with storage_client.open(region, "rb") as f: data = f.read() region_pred = xr.open_dataset(io.BytesIO(data)).load() # some code working on region_pred to compute weather indices... ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: ('Italian_Italy', '1252')
libhdf5: 1.14.0
libnetcdf: 4.9.2
xarray: 2023.8.0
pandas: 2.1.0
numpy: 1.26.0
scipy: 1.11.2
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.15.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8410/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1977485456 | I_kwDOAMm_X8513giQ | 8413 | Add a perception of a __xarray__ magic method | swamidass 6273919 | open | 0 | 4 | 2023-11-04T19:55:14Z | 2023-11-05T18:50:14Z | NONE | Is your feature request related to a problem?I am often moving data from external objects (of all sorts!) into xarray. This is a common use case Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into Describe the solution you'd likeSo here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best. Magic MethodsIt would be great to see these magic method signatures become integrated throughout the library:
Conversion RegistryAnd these extension functions to register converters:
Ideally, also, "deregister" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed. User APIAlong with the following new user API functions:
"as_xarray" returns (in order of precedence: - x unaltered if it is an xarray objects - registered_xarray_converter(x, args, kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, args, kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, *args, kwargs) if it is callable and does not throw an exception - x.xarray(args, kwargs), if it exits, is callable, and does not throw an exception - x.xarray_dataset(args, kwargs), if it exists, is callable, and does not throw an exception - x.xarray_dataarray(*args, kwargs), if it exists, is callable, and does not throw an exception - well known aliases of xarray_dataarray, such as x.to_xarray(args, *kwargs) (see pandas) - [DESIGN DECISION] convert and return tuple[dims, data, [attr, encoding] to DataArray? - [DESIGN DECISION] convert and return tuple encoding of DataSet? - [DESIGN DECISION] return DataArray wrapped duck-typed array in DataArray? The rationale for putting the registered functions first is that this would enable "as_dataarrray" would be slimilar, but it would only call x.xarray_dataarray and well known aliases. "as_dataset" would be slimilar, but it would only call x.xarray_dataset, well known aliases, and perhaps falling back to calling x.xarray_dataarray and converting the return a dataset if it has a name attribute. "as_datatree" would be slimilar, but it would only call x.xarray_datatree, and perhaps falling back to calling x.xarray_dataarray and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray) The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry. Across the Xarray LibraryFinally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library. Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_* functions. Describe alternatives you've consideredThis can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases. Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of repr_html to integrate with pandas. The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible:
But this doesn't work.
This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem. Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8413/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 887711474 | MDU6SXNzdWU4ODc3MTE0NzQ= | 5290 | Inconclusive error messages using to_zarr with regions | niowniow 5802846 | closed | 0 | 4 | 2021-05-11T15:54:39Z | 2023-11-05T06:28:39Z | 2023-11-05T06:28:39Z | CONTRIBUTOR | What happened:
The idea is to use a xarray dataset (stored as dummy zarr file), which is subsequently filled with the It seems the current implementation is only designed to either store coordinates for the whole dataset and write them to disk or to write without coordinates. I failed to understand this from the documentation and tried to create a dataset without coordinates and fill it with a dataset subset with coordinates. It gave some inconclusive errors depending on the actual code example (see below).
It might also be a bug and it should in fact be possible to add a dataset with coordinates to a dummy dataset without coordinates. Then there seems to be an issue regarding the handling of the variables during storing the region. ... or I might just have done it wrong... and I'm looking forward to suggestions. What you expected to happen: Either an error message telling me that that i should use coordinates during creation of the dummy dataset. Alternatively, if this is a bug and should be possible then it should just work. Minimal Complete Verifiable Example: ```python import dask.array import xarray as xr import numpy as np error = 1 # choose between 0 (no error), 1, 2, 3 dummies = dask.array.zeros(30, chunks=10) chunks in coords are not taken into account while saving!?coord_x = dask.array.zeros(30, chunks=10) # or coord_x = np.zeros((30,)) if error == 0: ds = xr.Dataset({"foo": ("x", dummies)}, coords={"x":coord_x}) else: ds = xr.Dataset({"foo": ("x", dummies)}) print(ds) path = "./tmp/test.zarr" ds.to_zarr(path, mode='w', compute=False, consolidated=True) create a new dataset to be input into a regionds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) if error == 1: ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) elif error == 2: ds.to_zarr(path, region={"x": slice(0, 10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: conflicting sizes for dimension 'x': length 10 on 'x' and length 30 on 'foo' elif error == 3: ds.to_zarr(path, region={"x": slice(0, 10)}) ds = xr.Dataset({"foo": ('x', np.arange(10))},coords={"x":np.arange(10)}) ds.to_zarr(path, region={"x": slice(10, 20)}) # ValueError: parameter 'value': expected array with shape (0,), got (10,) else: ds.to_zarr(path, region={"x": slice(10, 20)}) ds = xr.open_zarr(path) print('reopen',ds['x']) ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.18.0 pandas: 1.2.3 numpy: 1.19.2 scipy: 1.6.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: None sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5290/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 377356113 | MDU6SXNzdWUzNzczNTYxMTM= | 2542 | full_like, ones_like, zeros_like should retain subclasses | gerritholl 500246 | closed | 0 | 4 | 2018-11-05T11:22:49Z | 2023-11-05T06:27:31Z | 2023-11-05T06:27:31Z | CONTRIBUTOR | Code Sample,```python Your code hereimport numpy import xarray class MyDataArray(xarray.DataArray): pass da = MyDataArray(numpy.arange(5)) da2 = xarray.zeros_like(da) print(type(da), type(da2)) ``` Problem descriptionI would expect that
Expected OutputI would hope as an output:
In principle changing this could break people's code, so if a change is implemented it should probably be through an optional keyword argument to the Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2542/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1966675016 | I_kwDOAMm_X851ORRI | 8388 | Type annotation compatibility with numpy ufuncs | djhoese 1828519 | closed | 0 | 4 | 2023-10-28T17:25:11Z | 2023-11-02T12:44:50Z | 2023-11-02T12:44:50Z | CONTRIBUTOR | Is your feature request related to a problem?I'd like mypy to understand that xarray DataArrays passed to numpy ufuncs have a return type of xarray DataArray. ```python import xarray as xr import numpy as np def compute_relative_azimuth(sat_azi: xr.DataArray, sun_azi: xr.DataArray) -> xr.DataArray: abs_diff = np.absolute(sun_azi - sat_azi) ssadiff = np.minimum(abs_diff, 360 - abs_diff) return ssadiff ```
Describe the solution you'd likeI'm not sure if this is possible, if it is something xarray can fix, or something numpy needs to "fix". I'd like the above situation to "just work" without anything more than maybe some extra type-stub package. Describe alternatives you've consideredCast types or other type coercion or tell mypy to ignore the type issues for these numpy call. Additional context |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8388/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1445905299 | I_kwDOAMm_X85WLsOT | 7282 | groupby and mean on a MultiIndex level raises ValueError | jjpr-mit 25231875 | closed | 0 | 4 | 2022-11-11T19:15:58Z | 2023-10-30T09:18:54Z | 2023-08-31T03:50:33Z | NONE | What happened?After using What did you expect to happen?Apply mean to groups, no error. Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110]
python-bits: 64
OS: Linux
OS-release: 5.15.49-linuxkit
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.11.0
pandas: 1.5.1
numpy: 1.23.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.2.2
conda: None
pytest: None
IPython: None
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7282/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1953059418 | I_kwDOAMm_X850aVJa | 8345 | `.stack` produces large chunks | yt87 40218891 | closed | 0 | 4 | 2023-10-19T21:09:56Z | 2023-10-26T21:20:05Z | 2023-10-26T21:20:05Z | NONE | What happened?Xarray What did you expect to happen?I expect this to work. #5754 is closed. Minimal Complete Verifiable Example```Python import dask.array import numpy as np import xarray as xr var = xr.Variable( ("t", "z", "u", "x", "y"), dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)), ) da = xr.DataArray(var) def sum(ds): return ds.sum(dim="u") with dask.config.set(**{"array.slicing.split_large_chunks": True}): da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") da2 ``` MVCE confirmation
Relevant log output```PythonIndexError Traceback (most recent call last) Cell In[21], line 5 2 return ds.sum(dim="u") 4 with dask.config.set(**{"array.slicing.split_large_chunks": True}): ----> 5 da2 = da.stack(new=("z", "t")).groupby("new").map(sum).unstack("new") 6 da2 File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse) 2795 def unstack( 2796 self, 2797 dim: Dims = None, 2798 fill_value: Any = dtypes.NA, 2799 sparse: bool = False, 2800 ) -> Self: 2801 """ 2802 Unstack existing dimensions corresponding to MultiIndexes into 2803 multiple new dimensions. (...) 2853 DataArray.stack 2854 """ -> 2855 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2856 return self._from_temp_dataset(ds) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse) 5498 for d in dims: 5499 if needs_full_reindex: -> 5500 result = result._unstack_full_reindex( 5501 d, stacked_indexes[d], fill_value, sparse 5502 ) 5503 else: 5504 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5393 if name not in index_vars: 5394 if dim in var.dims: -> 5395 variables[name] = var.unstack({dim: new_dim_sizes}) 5396 else: 5397 variables[name] = var File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, **dimensions_kwargs) 1928 result = self 1929 for old_dim, dims in dimensions.items(): -> 1930 result = result._unstack_once_full(dims, old_dim) 1931 return result File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim) 1817 reordered = self.transpose(*dim_order) 1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes -> 1820 new_data = reordered.data.reshape(new_shape) 1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names 1823 return type(self)( 1824 new_dims, new_data, self._attrs, self._encoding, fastpath=True 1825 ) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape) 2217 if len(shape) == 1 and not isinstance(shape[0], Number): 2218 shape = shape[0] -> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit) 283 else: 284 chunk_plan.append("auto") --> 285 outchunks = normalize_chunks( 286 chunk_plan, 287 shape=shape, 288 limit=limit, 289 dtype=x.dtype, 290 previous_chunks=inchunks, 291 ) 293 x2 = x.rechunk(inchunks) 295 # Construct graph File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks) 3092 chunks = tuple("auto" if isinstance(c, str) and c != "auto" else c for c in chunks) 3094 if any(c == "auto" for c in chunks): -> 3095 chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3097 if shape is not None: 3098 chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape)) File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in <dictcomp>(.0) 3212 largest_block = math.prod( 3213 cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != "auto" 3214 ) 3216 if previous_chunks: 3217 # Base ideal ratio on the median chunk size of the previous chunks -> 3218 result = {a: np.median(previous_chunks[a]) for a in autos} 3220 ideal_shape = [] 3221 for i, s in enumerate(shape): IndexError: tuple index out of range ``` Anything else we need to know?The most recent traceback entry point to an issue in dask code. Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8345/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1923431725 | I_kwDOAMm_X85ypT0t | 8264 | Improve error messages | max-sixty 5635139 | open | 0 | 4 | 2023-10-03T06:42:57Z | 2023-10-24T18:40:04Z | MEMBER | Is your feature request related to a problem?Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline. Some of the error messages could be much more helpful. Take one example:
The second sentence is nice. But the first could be give us much more information:
- Which variables conflict? I'm merging four objects, so would be so helpful to know which are causing the issue.
- What is the conflict? Is one a superset and I can Having these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library. Describe the solution you'd likeI'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date. One thing we do in PRQL is have a file that snapshots error messages Any other ideas? Describe alternatives you've consideredNo response Additional contextA couple of specific error-message issues: - https://github.com/pydata/xarray/issues/2078 - https://github.com/pydata/xarray/issues/5290 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8264/reactions",
"total_count": 2,
"+1": 2,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 529644880 | MDU6SXNzdWU1Mjk2NDQ4ODA= | 3580 | xr.DataArray.values fails with latest versions of netcdf4 | kpegion 16332933 | closed | 0 | 4 | 2019-11-28T01:26:07Z | 2023-10-18T17:01:17Z | 2023-10-18T17:01:17Z | NONE | MCVE Code Sample```python import xarray as xr xr.show_versions() url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/NCEP-CFSv2/.HINDCAST/.MONTHLY/.sst/dods' fullda = xr.open_dataset(url, decode_times=False,chunks={'S': 'auto', 'L': 'auto', 'M':'auto','X':'auto','Y':'auto'}) print(fullda) print(fullda['sst'][:10,0,0,0,0].values) ``` Expected Output
Problem DescriptionThis should return the array’s data as a numpy.ndarray according to the documentation and as shown above. I tested this with various versions of netcdf4 and I get the error below for netcdf4 versions 1.5.1, 1.5.1.2, 1.5.3 (latest version). If I use netcdf4 version 1.5.1, I get the expected output as above. ``` python <xarray.Dataset> Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181) Coordinates: * X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 * S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0 * M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0 * Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0 Data variables: sst (S, L, M, Y, X) float32 dask.array<chunksize=(29, 10, 24, 51, 45), meta=np.ndarray> Attributes: Conventions: IRIDL Traceback (most recent call last): File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 84, in _getitem array = getitem(original_array, key) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/common.py", line 54, in robust_getitem return array[key] File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.getitem File "netCDF4/_netCDF4.pyx", line 5350, in netCDF4._netCDF4.Variable._get IndexError: index exceeds dimension bounds During handling of the above exception, another exception occurred: Traceback (most recent call last): File "testpython.py", line 7, in <module> print(fullda['sst'][:10,0,0,0,0].values) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/dataarray.py", line 567, in values return self.variable.values File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py", line 448, in values return as_array_or_item(self._data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/variable.py", line 254, in _as_array_or_item data = np.asarray(data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py", line 1314, in __array__ x = self.compute() File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py", line 165, in compute (result,) = compute(self, traverse=False, kwargs) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/base.py", line 436, in compute results = schedule(dsk, keys, kwargs) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/threaded.py", line 81, in get *kwargs File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 486, in get_async raise_exception(exc, tb) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 316, in reraise raise exc File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/local.py", line 222, in execute_task result = _execute_task(task, data) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/core.py", line 119, in _execute_task return func(args2) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/dask/array/core.py", line 106, in getter c = np.asarray(c) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 481, in array return np.asarray(self.array, dtype=dtype) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 643, in array return np.asarray(self.array, dtype=dtype) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray return array(a, dtype, copy=False, order=order) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 547, in array return np.asarray(array[self.key], dtype=None) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 72, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self.getitem File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/core/indexing.py", line 827, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "/homes/kpegion/.conda/envs/testenv3-dev/lib/python3.6/site-packages/xarray/backends/netCDF4.py", line 94, in _getitem raise IndexError(msg) IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load(). ``` Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3580/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1924497392 | I_kwDOAMm_X85ytX_w | 8269 | open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0' | mps01060 6819509 | closed | 0 | 4 | 2023-10-03T16:19:54Z | 2023-10-18T16:50:20Z | 2023-10-18T16:50:20Z | NONE | What is your issue?When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units "days accumulated", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?). Open the zarr:
Print as a pandas-like table for each version of xarray for readability:
Version '2023.8.0': |time|dapr (dtype=float32)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|NaN|NaN| |2000-01-02|NaN|NaN| |2000-01-03|2.0|1.5| Version '2023.9.0': |time|dapr (dtype=float64)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|-9.223372e+18|NaN| |2000-01-02|-9.223372e+18|NaN| |2000-01-03|2.000000e+00|1.5| I can manually disable this by using the "use_cf=False", "mask_and_scale=False", and then manually scale this variable, though that is not ideal. The "decode_timedelta" doesn't seem to have an effect on this data, either. I understand the "days" keyword is in my units, however the full unit is "days accumulated". Has the behavior of xarray changed to find keywords such as "days" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help. Code to create the debug.zarr for the tables above:```python import numpy as np import pandas as pd import xarray as xr import zarr Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)mdpr is the amount of a multiday total (inches)dapr is the number of days each multiday total occurred over (days accumulated).In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03I use float32 to represent these, but pack these as int16 values in the zarr.mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32) dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32) time = pd.date_range('2000-01-01', periods=3) Create a dataset from these valuesds = xr.Dataset( data_vars=dict( mdpr=(['time'], mdpr), dapr=(['time'], dapr), ), coords=dict( time=time, ), attrs=dict(description='multiday precipitation data'), ) Specify encoding to pack these float32 values as int16encoding = { 'mdpr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 0.01, 'add_offset': 0.0, 'dtype': np.int16, }, 'dapr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 1.0, 'add_offset': 0.0, 'dtype': np.int16, }, } Create attributes. The "units" for the dapr variable seems to be the issue "days" in the"days accumulated"ds.mdpr.attrs['units'] = 'inches' ds.mdpr.attrs['description'] = 'multiday precip amount' ds.dapr.attrs['units'] = 'days accumulated' ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation' Save to zarrds.to_zarr('debug.zarr', mode='w', encoding=encoding) ``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8269/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1384226112 | I_kwDOAMm_X85SgZ1A | 7075 | Convert xarray dataset to pandas dataframe is much slower in newest xarray version | rilllydi 20794996 | closed | 0 | 4 | 2022-09-23T19:36:28Z | 2023-10-14T20:37:40Z | 2023-10-14T20:37:40Z | NONE | What is your issue?Converting an xarray dataset to pandas dataframe has become much slower in the newest xarray version. I want to read in very large netcdf files, extract a slice, and convert the slice to a pandas dataframe. For an input size of 2GB, the xarray version 0.21.0 takes 3 seconds versus the xarray version 2022.6.0 takes 44 seconds. See table below for more tests with increasing size of xarray dataset. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> Number of NetCDF Input Files in Xarray Dataset (~1GB per file): | 2 | 5 | 10 | 15 | 20 | 30 | 40 -- | -- | -- | -- | -- | -- | -- | -- Older Xarray Version 0.21.0 | 0:03 | 0:02 | 0:04 | 0:06 | 0:09 | 0:13 | 0:17 Newer Xarray Version 2022.6.0 | 0:44 | 1:30 | 2:46 | 4:01 | 5:23 | 7:56 | 10:29 </body> </html>Here is my code: ``` Read in a list of netcdf files and combine into a single dataset.with xr.open_mfdataset(infile_list, combine='by_coords') as ds:
``` The netcdf files I am reading in are about 1 GB each, containing daily weather data for the entire CONUS. There is 1 file per year, so if I read in 2 files, the dimensions are (lon: 1386, lat: 585, day: 731, crs: 1) with coordinates of lon, lat, day, and crs. They include 8 float data variables. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7075/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1943355490 | I_kwDOAMm_X85z1UBi | 8308 | Different plotting reaults compared to matplotlib | zxdawn 30388627 | closed | 0 | 4 | 2023-10-14T15:54:32Z | 2023-10-14T20:02:16Z | 2023-10-14T20:02:16Z | NONE | What happened?I got different results when I tried to plot 2D data test.npy.zip using matplotlib and xarray. matplotlibxarrayWhat did you expect to happen?Same plot. Minimal Complete Verifiable Example```Python import numpy as np import xarray as xr import matplotlib.pyplot as plt test = np.load('test.npy') plt.imshow(test, vmin=0, vmax=200) plt.colorbar() xr.DataArray(test).plot.imshow(vmin=0, vmax=200) ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 22.3.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.26.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8308/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1821467933 | I_kwDOAMm_X85skWUd | 8021 | Specify chunks in bytes | mrocklin 306380 | open | 0 | 4 | 2023-07-26T02:29:43Z | 2023-10-06T10:09:33Z | MEMBER | Is your feature request related to a problem?I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an Dask array does this in two ways. We can provide a value in chunks as like the following:
We also refer to a value in Dask config ```python In [1]: import dask In [2]: dask.config.get("array.chunk-size") Out[2]: '128MiB' ``` This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂 Describe the solution you'd likeNo response Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8021/reactions",
"total_count": 2,
"+1": 2,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1169750048 | I_kwDOAMm_X85FuPgg | 6360 | Multidimensional `interpolate_na()` | iuryt 5797727 | open | 0 | 4 | 2022-03-15T14:27:46Z | 2023-09-28T11:51:20Z | NONE | Is your feature request related to a problem?I think that having a way to run a multidimensional interpolation for filling missing values would be awesome. The code snippet below create a data and show the problem I am having now. If the data has some orientation, we couldn't simply interpolate dimensions separately. ```python import xarray as xr import numpy as np n = 30 x = xr.DataArray(np.linspace(0,2np.pi,n),dims=['x']) y = xr.DataArray(np.linspace(0,2np.pi,n),dims=['y']) z = (np.sin(x)*xr.ones_like(y)) mask = xr.DataArray(np.random.randint(0,1+1,(n,n)).astype('bool'),dims=['x','y']) kw = dict(add_colorbar=False) fig,ax = plt.subplots(1,3,figsize=(11,3)) z.plot(ax=ax[0],kw) z.where(mask).plot(ax=ax[1],kw) z.where(mask).interpolate_na('x').plot(ax=ax[2],**kw) ```
I tried to use advanced interpolation for that, but it doesn't look like the best solution. ```python zs = z.where(mask).stack(k=['x','y']) zs = zs.where(np.isnan(zs),drop=True) xi,yi = zs.k.x.drop('k'),zs.k.y.drop('k') zi = z.interp(x=xi,y=yi) fig,ax = plt.subplots() z.where(mask).plot(ax=ax,kw) ax.scatter(xi,yi,c=zi,kw,linewidth=1,edgecolor='k') ``` returns
Describe the solution you'd likeSimply Describe alternatives you've consideredI could extract the data to Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6360/reactions",
"total_count": 11,
"+1": 9,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 2
} |
xarray 13221727 | issue | ||||||||
| 1905824568 | I_kwDOAMm_X85xmJM4 | 8221 | Frequent doc build timeout / OOM | max-sixty 5635139 | open | 0 | 4 | 2023-09-20T23:02:37Z | 2023-09-21T03:50:07Z | MEMBER | What is your issue?I'm frequently seeing It's after 1552 seconds, so it not being a round number means it might be the memory? It follows Here's an example: https://readthedocs.org/projects/xray/builds/21983708/ Any thoughts for what might be going on? |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8221/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1326238990 | I_kwDOAMm_X85PDM0O | 6870 | `rolling_exp` loses coords | max-sixty 5635139 | closed | 0 | 4 | 2022-08-02T18:27:44Z | 2023-09-19T01:13:23Z | 2023-09-19T01:13:23Z | MEMBER | What happened?We lose the time coord here — ```python ds = xr.tutorial.load_dataset("air_temperature") ds.rolling_exp(time=5).mean() <xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 Dimensions without coordinates: time Data variables: air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.4 296.1 295.7 ``` (I realize I wrote this, I didn't think this used to happen, but either it always did or I didn't write good enough tests... mea culpa) What did you expect to happen?We keep the time coords, like we do for normal
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 (main, May 24 2022, 21:13:51)
[Clang 13.1.6 (clang-1316.0.21.2)]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.21.6
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.12.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.12.0
distributed: 2021.12.0
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: 0.2.1
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 62.3.2
pip: 22.1.2
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: 4.3.2
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6870/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 598991028 | MDU6SXNzdWU1OTg5OTEwMjg= | 3967 | Support static type analysis | eric-czech 6130352 | closed | 0 | 4 | 2020-04-13T16:34:43Z | 2023-09-17T19:43:32Z | 2023-09-17T19:43:31Z | NONE | As a related discussion to https://github.com/pydata/xarray/issues/3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis. In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able enforce that names and dtypes associated with both data variables and coordinates meet certain constraints. @keewis mentioned an example of this in https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 where it might be possible to use something like a An example of where this would be useful is in adding extensions through accessors: ```python @xr.register_dataset_accessor('ext') def ExtAccessor: def init(self, ds) self.data = ds
ds = xr.Dataset(dict(DATA=xr.DataArray([0.0]))) I'd like to catch that "data" was misspelled as "DATA" and thatthis particular method shouldn't be run against floats prior to runtimeds.ext.is_zero() ``` I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too. There is a related conversation on doing something like this for Pandas DataFrames at https://github.com/python/typing/issues/28#issuecomment-351284520, so that might be helpful context for possibilities with |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3967/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 561921094 | MDU6SXNzdWU1NjE5MjEwOTQ= | 3762 | xarray groupby/map fails to parallelize | bjcosta 6491058 | closed | 1 | 4 | 2020-02-07T23:20:59Z | 2023-09-15T15:52:42Z | 2023-09-15T15:52:41Z | NONE | MCVE Code Sample```python import sys import math import logging import dask import xarray import numpy logger = logging.getLogger('main') if name == 'main': logging.basicConfig( stream=sys.stdout, format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
``` Expected OutputI am fairly new to xarray but feel this example could have been executed a bit better than xarray currenty does. Each map call of the above custom function should be possible to be parallelized from what I can tell. I imagined that in the backend, xarray would have chunked it and run in parallel on dask. However I find it is VERY slow even for single threaded case but also that it doesn't seem to parallelize. It takes roughly 5msec per map call in my hardware when I don't include the chunk and 70msec with the chunk call you can find in the code. Problem DescriptionThe single threaded performance is super slow, but also it fails to parallelize the computations across the cores on my machine. If you are after more background to what I am trying to do, I also asked a SO question about how to re-organize the code to improve performance. I felt the current behavior though is a performance bug (assuming I didn't do something completely wrong in the code). Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3762/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1473152374 | I_kwDOAMm_X85XzoV2 | 7348 | Using entry_points to register dataset and dataarray accessors? | nbren12 1386642 | open | 0 | 4 | 2022-12-02T16:48:42Z | 2023-09-14T19:53:46Z | CONTRIBUTOR | Is your feature request related to a problem?External libraries often use the dataset/dataarray accessor pattern (e.g. metpy). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g. ``` import metpy # linter complains here some datads: xr.Dataset = ... ds.metpy.... ``` Describe the solution you'd likeUse importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. metpy's setup.cfg). Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7348/reactions",
"total_count": 2,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 1
} |
xarray 13221727 | issue | ||||||||
| 1098241812 | I_kwDOAMm_X85BddcU | 6149 | [Bug]: `numpy` `DeprecationWarning` with `DType` and `xr.testing.assert_all_close()` + Dask | tomvothecoder 25624127 | closed | 0 | 4 | 2022-01-10T18:34:27Z | 2023-09-13T20:06:59Z | 2023-09-13T20:06:58Z | CONTRIBUTOR | What happened?A What did you expect to happen?The warning should not appear. Minimal Complete Verifiable Example```python class TestTemporalAvg: class TestTimeseries: @pytest.fixture(autouse=True) def setup(self): self.ds: xr.Dataset = generate_dataset(cf_compliant=True, has_bounds=True)
``` Relevant log output
Anything else we need to know?No response EnvironmentINSTALLED VERSIONScommit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.45.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.11.2 distributed: 2021.11.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: 4.3.1 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6149/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | xarray 13221727 | issue | ||||||
| 1075765204 | I_kwDOAMm_X85AHt_U | 6055 | Unexpected type conversion in variables with _FillValue | jp-dark 24235303 | closed | 0 | 4 | 2021-12-09T16:26:54Z | 2023-09-13T12:40:14Z | 2023-09-13T12:40:13Z | CONTRIBUTOR | What happened:
When opening a dataset with an int16 variable with the What you expected to happen: I would expect the type to remain the same when applying the _FillValue. Minimal Complete Verifiable Example: Original example from TileDB-CF-Py issue #117 using the TileDB backend. ```python import tiledb import xarray as xr import numpy as np index = tiledb.Dim(name='index', domain=(0, 3)) domain = tiledb.Domain(index) var = tiledb.Attr(name='var', dtype=np.int16) schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False) tiledb.Array.create('dense_array0', schema) with tiledb.open('dense_array0', 'w') as A: A[:] = np.array([5, 6, 7, 8], dtype=np.int16) ds = xr.open_dataset('dense_array0', engine='tiledb') ds['var'].dtype ``` NetCDF example with the same behavior: ```python import netCDF4 import xarray as xr import numpy as np filename = 'temp_file.nc' with netCDF4.Dataset(filename, mode="w") as group: group.createDimension("index", 4) var = group.createVariable("var", np.int16, ("index",), fill_value=-1) var[:] = np.array([5, 6, 7, 8], dtype=np.int16) dataset = xr.open_dataset(filename) dataset["var"].dtype ``` Anything else we need to know?:
* I was able to verify the type conversion from int16 to float32 occurs in the Environment: I was able to reproduce this with both xarray 0.19.0 and 0.20.1 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6055/reactions",
"total_count": 1,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 1
} |
completed | xarray 13221727 | issue | ||||||
| 514672231 | MDU6SXNzdWU1MTQ2NzIyMzE= | 3466 | RuntimeError: NetCDF: DAP failure | b-kode 47066389 | closed | 1 | 4 | 2019-10-30T13:32:34Z | 2023-09-12T16:00:57Z | 2023-09-12T16:00:57Z | NONE | Hi all, I am interested in extracting specific point and variable information from the GEOS-FC product, accessible via OpenDap. Loading the data seems to work fine, and I can do some processing to my specific needs. Ideally I would like to convert this selection to a dataframe, or if needed store as an intermediate file from which I can read again. Yet when doing so, I get the following error: RuntimeError: NetCDF: DAP failure I am not sure what is causing this? Perhaps I chunck the data in the wrong (inefficient) way? Or there is an error with the GEOS netcdf files? Or ... Below a working code snippet. ``` python import xarray as xr idir_geos = 'https://opendap.nccs.nasa.gov/dods/gmao/geos-cf/assim/chm_tavg_1hr_g1440x721_v1' def preprocess(ds): ''' Rename variables and select the relevant ones. Remove lev''' ds = ds.rename({'pm25_rh35_gcc': 'PM2.5','no': 'NO','no2': 'NO2','o3': 'O3','so2': 'SO2','co': 'CO'}) ds = ds[['PM2.5','NO','NO2','O3','SO2','CO']] ds = ds.squeeze('lev') return ds ds = xr.open_mfdataset([idir_geos],preprocess=preprocess,combine='by_coords') lat = 51.25 lon = 4.25 pol = 'O3' ds_sel = ds.sel(lat=lat,lon=lon,method='nearest')[pol] df_sel = ds_sel.to_dataframe().drop(['lat','lon'],axis=1) ds_sel.to_netcdf('test.nc') # Runtime error``` Traceback error:
More info on my xarray installation:commit: None python: 3.6.9 (default, Jul 3 2019, 07:38:46) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_GB.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.3 pydap: installed h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: 1.2.1 dask: 0.16.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 9.0.1 conda: None pytest: 5.2.1 IPython: 7.3.0 sphinx: 1.8.4 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3466/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1339921253 | I_kwDOAMm_X85P3ZNl | 6919 | Parallel read with MPI | mengaldo 8100801 | closed | 0 | 4 | 2022-08-16T07:19:14Z | 2023-09-12T15:16:32Z | 2023-09-12T15:16:31Z | NONE | Is your feature request related to a problem?Is it possible to somehow extend xarray to use MPI I/O? Describe the solution you'd likeWe would need to know the offset from where the actual data starts within the file. Is there a way of retrieving that? Disclaimer: I am not an expert of NetCDF format - so, apologies if the question is trivial! Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6919/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1861335844 | I_kwDOAMm_X85u8bsk | 8096 | Errors when saving PyObject coordinates | krokosik 38408316 | closed | 0 | 4 | 2023-08-22T12:14:53Z | 2023-09-06T11:44:41Z | 2023-09-06T11:44:41Z | CONTRIBUTOR | What happened?Hi, I'm trying to create a What did you expect to happen?I want to be able to save and load such coordinates without errors. Maybe there is a cleaner way to do it than the object dtype ndarray? Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output```Python File c:\Users\Wiktor\AppData\Local\pypoetry\Cache\virtualenvs\spin1-JGuolXDk-py3.11\Lib\site-packages\xarray\core\dataarray.py:4014, in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 4010 else: 4011 # No problems with the name - so we're fine! 4012 dataset = self.to_dataset() -> 4014 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 4015 dataset, 4016 path, 4017 mode=mode, 4018 format=format, 4019 group=group, 4020 engine=engine, 4021 encoding=encoding, 4022 unlimited_dims=unlimited_dims, ... 101 result = np.empty(data.shape, dtype) --> 102 result[...] = data 103 return result ValueError: setting an array element with a sequence. ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Polish_Poland', '1250')
libhdf5: None
libnetcdf: None
xarray: 2023.8.0
pandas: 2.0.3
numpy: 1.25.2
scipy: 1.11.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: 7.1.2
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8096/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1870484988 | I_kwDOAMm_X85vfVX8 | 8120 | `open_mfdataset` exits while sending a "Segmentation fault" error | kasra-keshavarz 50383939 | closed | 0 | 4 | 2023-08-28T20:51:23Z | 2023-09-01T15:43:08Z | 2023-09-01T15:43:08Z | NONE | What is your issue?I try to open about ~10 files, each 5MB as a test case, using ```python $ ipython Python 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import xarray as xr In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}) In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 744, rlat: 140, rlon: 105) Coordinates: * time (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0... lon (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> lat (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray> * rlon (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4 * rlat (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68 Data variables: rotated_pole (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 RDRS_v2.1_P_UVC_10m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FI_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_FB_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_A_PR0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_P0_SFC (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_TT_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> RDRS_v2.1_P_HU_1.5m (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray> Attributes: CDI: Climate Data Interface version 2.0.4 (https://mpimet.mpg.de... Conventions: CF-1.6 product: RDRS_v2.1 Remarks: Variable names are following the convention <Product>_<Type... License: These data are provided by the Canadian Surface Prediction ... history: Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1... NCO: netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne... CDO: Climate Data Operators version 2.0.4 (https://mpimet.mpg.de... In [4]: type(ds) Out[4]: xarray.core.dataset.Dataset In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True) [gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) [gra-login3:25527] *** Process received signal *** [gra-login3:25527] Signal: Segmentation fault (11) [gra-login3:25527] Signal code: (128) [gra-login3:25527] Failing at address: (nil) Segmentation fault ``` Here is the version of ```python In [5]: xr.show_versions() /home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/init.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONScommit: None python: 3.10.2 (main, Feb 4 2022, 19:10:35) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.88.1.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.9.0 xarray: 2023.7.0 pandas: 1.4.0 numpy: 1.21.2 scipy: 1.8.0 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.0 distributed: 2023.8.0 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 60.2.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.10.0 sphinx: None ``` I'm working on an HPC, so if a list "modules" I have loaded helps, here it is: ```console $ module list Currently Loaded Modules: 1) CCconfig 5) gcccore/.9.3.0 (H) 9) libfabric/1.10.1 13) ipykernel/2023a 17) sqlite/3.38.5 21) postgresql/12.4 (t) 25) gdal/3.5.1 (geo) 29) udunits/2.2.28 (t) 33) cdo/2.2.1 (geo) 2) gentoo/2020 (S) 6) imkl/2020.1.217 (math) 10) openmpi/4.0.3 (m) 14) scipy-stack/2023a (math) 18) jasper/2.0.16 (vis) 22) freexl/1.0.5 (t) 26) geos/3.10.2 (geo) 30) libaec/1.0.6 34) mpi4py/3.1.3 (t) 3) StdEnv/2020 (S) 7) gcc/9.3.0 (t) 11) libffi/3.3 15) hdf5/1.10.6 (io) 19) libgeotiff-proj901/1.7.1 23) librttopo-proj9/1.1.0 27) proj/9.0.1 (geo) 31) eccodes/2.25.0 (geo) 35) netcdf-fortran/4.5.2 (io) 4) mii/1.1.2 8) ucx/1.8.0 12) python/3.10.2 (t) 16) netcdf/4.7.4 (io) 20) cfitsio/4.1.0 (vis) 24) libspatialite-proj901/5.0.1 28) expat/2.4.1 (t) 32) yaxt/0.9.0 (t) 36) libspatialindex/1.8.5 (phys) Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques io: Input/output software / Logiciel d'écriture/lecture t: Tools for development / Outils de développement vis: Visualisation software / Logiciels de visualisation geo: Geography libraries/apps / Logiciels de géographie phys: Physics libraries/apps / Logiciels de physique H: Hidden Module ``` Thanks. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8120/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1611701140 | I_kwDOAMm_X85gEJuU | 7588 | xr.merge with compat="minimal" returns corrupted Dataset and causes __len__ to return wrong and possibly negative values. | Metamess 2466330 | closed | 0 | 4 | 2023-03-06T15:47:40Z | 2023-08-30T09:14:19Z | 2023-08-30T07:57:37Z | CONTRIBUTOR | What happened?When merging multiple datasets with the compat="minimal" option, coordinates whose variables are dropped due to incompatibility are still saved in the dataset's This is directly related to the bug described in issue 7405. As seen there, one result is that dropped coordinate still evaluates as being contained in the resulting dataset's At least one other (perhaps more severe) result of this bug is connected to the fact that the If a coordinate was dropped as a result of the merge, it is no longer part of the One instance where this causes immediate errors is when trying to print the resulting dataset. As part of the While this is undoubtedly only one of many places where the incorrect What did you expect to happen?To get a Dataset with the correct Minimal Complete Verifiable Example```Python import xarray as xr ds1 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 4}) ds2 = xr.Dataset(coords={"foo": [1, 2, 3], "bar": 5}) res = xr.merge([ds1, ds2], compat="minimal") # If the result is not captured in res, this will cause a ValueError as the interpreter attempts to print the result res.coords Coordinates:* foo (foo) int64 1 2 3res._coord_names {'foo', 'bar'}"bar" in res.coords # As shown in issue #7405. Note "bar" is not printed in res.coords, revealing an interesting disconnect in behaviors of different functions targeting a dataset's coordinates Trueres ValueError: len() should return >= 0``` MVCE confirmation
Relevant log output```Python
Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.16.3-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.2.0
pandas: 1.5.1
numpy: 1.24.2
scipy: 1.10.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.6
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.6
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.6.0
pip: 23.0.1
conda: None
pytest: 7.2.1
mypy: 1.0.1
IPython: 7.34.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7588/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1858062203 | I_kwDOAMm_X85uv8d7 | 8090 | DataArrayResampleAggregations break with _flox_reduce where source DataArray has a discontinuous time dimension | ollie-bell 56110893 | open | 0 | 4 | 2023-08-20T09:48:42Z | 2023-08-24T04:20:32Z | NONE | What happened?When resampling a DataArray with a discontinuity in the time dimension the resample object contains placeholder groups for the missing times in between the present times. This seems to cause flox reductions to break ( What did you expect to happen?The result should be computed successfully in the same way that it is without using flox. Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np dates = (("1980-12-01", "1990-11-30"), ("2000-12-01", "2010-11-30")) times = [xr.cftime_range(*d, freq="D", calendar="360_day") for d in dates] da = xr.concat( [xr.DataArray(np.random.rand(len(t)), coords={"time": t}, dims="time") for t in times], dim="time" ) da = da.chunk(time=360) with xr.set_options(use_flox=True): # FAILS - discontinuous time dimension before resample (da > 0.5).resample(time="AS-DEC").any(dim="time") with xr.set_options(use_flox=True): # SUCCEEDS - continuous time dimension before resample(da.sel(time=slice(*dates[0])) > 0.5).resample(time="AS-DEC").any(dim="time")with xr.set_options(use_flox=True): # SUCCEEDS - compute chunks before resample(da > 0.5).compute().resample(time="AS-DEC").any(dim="time")with xr.set_options(use_flox=False): # SUCCEEDS - don't use flox(da > 0.5).resample(time="AS-DEC").any(dim="time")``` MVCE confirmation
Relevant log output```PythonValueError Traceback (most recent call last) Cell In[60], line 1 ----> 1 (da > 0.5).resample(time="AS-DEC").any(dim="time") File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/_aggregations.py:7029, in DataArrayResampleAggregations.any(self, dim, keep_attrs, kwargs)
6960 """
6961 Reduce this DataArray's data by applying File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/resample.py:57, in Resample._flox_reduce(self, dim, keep_attrs, kwargs) 51 def _flox_reduce( 52 self, 53 dim: Dims, 54 keep_attrs: bool | None = None, 55 kwargs, 56 ) -> T_Xarray: ---> 57 result = super()._flox_reduce(dim=dim, keep_attrs=keep_attrs, **kwargs) 58 result = result.rename({RESAMPLE_DIM: self._group_dim}) 59 return result File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/groupby.py:1018, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs)
1015 kwargs.setdefault("min_count", 1)
1017 output_index = grouper.full_index
-> 1018 result = xarray_reduce(
1019 obj.drop_vars(non_numeric.keys()),
1020 self._codes,
1021 dim=parsed_dim,
1022 # pass RangeIndex as a hint to flox that File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:408, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, fill_value, dtype, method, engine, keep_attrs, skipna, min_count, reindex, by, finalize_kwargs) 406 output_core_dims = [d for d in input_core_dims[0] if d not in dim_tuple] 407 output_core_dims.extend(group_names) --> 408 actual = xr.apply_ufunc( 409 wrapper, 410 ds_broad.drop_vars(tuple(missing_dim)).transpose(..., grouper_dims), 411 *by_da, 412 input_core_dims=input_core_dims, 413 # for xarray's test_groupby_duplicate_coordinate_labels 414 exclude_dims=set(dim_tuple), 415 output_core_dims=[output_core_dims], 416 dask="allowed", 417 dask_gufunc_kwargs=dict( 418 output_sizes=group_sizes, output_dtypes=[dtype] if dtype is not None else None 419 ), 420 keep_attrs=keep_attrs, 421 kwargs={ 422 "func": func, 423 "axis": axis, 424 "sort": sort, 425 "fill_value": fill_value, 426 "method": method, 427 "min_count": min_count, 428 "skipna": skipna, 429 "engine": engine, 430 "reindex": reindex, 431 "expected_groups": tuple(expected_groups), 432 "isbin": isbins, 433 "finalize_kwargs": finalize_kwargs, 434 "dtype": dtype, 435 "core_dims": input_core_dims, 436 }, 437 ) 439 # restore non-dim coord variables without the core dimension 440 # TODO: shouldn't apply_ufunc handle this? 441 for var in set(ds_broad._coord_names) - set(ds_broad._indexes) - set(ds_broad.dims): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:1185, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, args) 1183 # feed datasets apply_variable_ufunc through apply_dataset_vfunc 1184 elif any(is_dict_like(a) for a in args): -> 1185 return apply_dataset_vfunc( 1186 variables_vfunc, 1187 args, 1188 signature=signature, 1189 join=join, 1190 exclude_dims=exclude_dims, 1191 dataset_join=dataset_join, 1192 fill_value=dataset_fill_value, 1193 keep_attrs=keep_attrs, 1194 ) 1195 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc 1196 elif any(isinstance(a, DataArray) for a in args): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:469, in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, args) 464 list_of_coords, list_of_indexes = build_output_coords_and_indexes( 465 args, signature, exclude_dims, combine_attrs=keep_attrs 466 ) 467 args = tuple(getattr(arg, "data_vars", arg) for arg in args) --> 469 result_vars = apply_dict_of_variables_vfunc( 470 func, args, signature=signature, join=dataset_join, fill_value=fill_value 471 ) 473 out: Dataset | tuple[Dataset, ...] 474 if signature.num_outputs > 1: File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:411, in apply_dict_of_variables_vfunc(func, signature, join, fill_value, args) 409 result_vars = {} 410 for name, variable_args in zip(names, grouped_by_name): --> 411 result_vars[name] = func(variable_args) 413 if signature.num_outputs > 1: 414 return _unpack_dict_tuples(result_vars, signature.num_outputs) File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/xarray/core/computation.py:761, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, args) 756 if vectorize: 757 func = _vectorize( 758 func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims 759 ) --> 761 result_data = func(input_data) 763 if signature.num_outputs == 1: 764 result_data = (result_data,) File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/xarray.py:379, in xarray_reduce.<locals>.wrapper(array, func, skipna, core_dims, by, kwargs) 376 offset = min(array) 377 array = datetime_to_numeric(array, offset, datetime_unit="us") --> 379 result, groups = groupby_reduce(array, by, func=func, *kwargs) 381 # Output of count has an int dtype. 382 if requires_numeric and func != "count": File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:2011, in groupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, dtype, min_count, method, engine, reindex, finalize_kwargs, *by) 2005 groups = (groups[0][sorted_idx],) 2007 if factorize_early: 2008 # nan group labels are factorized to -1, and preserved 2009 # now we get rid of them by reindexing 2010 # This also handles bins with no data -> 2011 result = reindex_( 2012 result, from_=groups[0], to=expected_groups, fill_value=fill_value 2013 ).reshape(result.shape[:-1] + grp_shape) 2014 groups = final_groups 2016 if is_bool_array and (_is_minmax_reduction(func) or _is_first_last_reduction(func)): File ~/miniconda3/envs/forge310/lib/python3.10/site-packages/flox/core.py:428, in reindex_(array, from_, to, fill_value, axis, promote) 426 if any(idx == -1): 427 if fill_value is None: --> 428 raise ValueError("Filling is required. fill_value cannot be None.") 429 indexer[axis] = idx == -1 430 # This allows us to match xarray's type promotion rules ValueError: Filling is required. fill_value cannot be None. ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 22.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.7.0
pandas: 1.5.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
iris: 3.6.1
bottleneck: 1.3.7
dask: 2023.8.1
distributed: 2023.8.1
matplotlib: 3.7.2
cartopy: 0.22.0
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.6.0
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8090/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1325665237 | I_kwDOAMm_X85PBAvV | 6866 | Confusing terminologies and some errors in the official documentation | v-liuwei 49091585 | closed | 0 | 4 | 2022-08-02T10:48:07Z | 2023-08-23T14:20:23Z | 2023-08-23T14:20:23Z | NONE | What happened?To note, I'm using the stable version(2022.6.0). First, I'm confused that both Second, I found that there are some errors in the documentation:
AssertionError Traceback (most recent call last) <ipython-input-202-f217d18e6979> in <module> ----> 1 assert len(arr.dims) == len(arr.indexes), f"{len(arr.dims)=}, {len(arr.indexes)=}" AssertionError: len(arr.dims)=2, len(arr.indexes)=1
In [3]: arr.indexes
Out[3]:
Indexes:
x: Index(['a', 'b'], dtype='object', name='x')
May I have missed something? Thanks in advance for the reply. What did you expect to happen?No response Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.1
scipy: 1.3.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 45.2.0
pip: 22.2.1
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6866/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 979316661 | MDU6SXNzdWU5NzkzMTY2NjE= | 5738 | Flexible indexes: how to handle possible dimension vs. coordinate name conflicts? | benbovy 4160723 | closed | 0 | 4 | 2021-08-25T15:31:39Z | 2023-08-23T13:28:41Z | 2023-08-23T13:28:40Z | MEMBER | Another thing that I've noticed while working on #5692. Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with I'm wondering how we should handle this in the context of flexible / custom indexes: A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in B. Introduce some tag in C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly? D. Eventually revert #2353 and let users taking care of potential conflicts. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5738/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 448082431 | MDU6SXNzdWU0NDgwODI0MzE= | 2986 | How to add a custom indexer. | fbriol 397386 | closed | 0 | 4 | 2019-05-24T09:56:25Z | 2023-08-23T12:24:21Z | 2023-08-23T12:24:20Z | CONTRIBUTOR | Hello, I have written a set of indexers for 1D, 2D and 3D geodetic and Cartesian data (up to 5 dimensions for Cartesian data). I used the Boost/C++ library to write the multidimensional data search algorithm. This tree (R*Tree) is impressive for its performance. It can be built in a few seconds with several million points and made requests for a few seconds with several million points. ```python import numpy as np Install it with conda, if you want, only for python3.7: conda install pyindex -c fbriolimport pyindex.core as core lon = np.random.uniform(-180.0, 180.0, 20484096) lat = np.random.uniform(-90.0, 90.0, 20484096) You can not set an altitude if it is not necessary.alt = np.random.uniform(-10000, 100000, 2048*4096) WGS system usedsystem = core.geodetic.System() RTreetree = core.geodetic.RTree(system) %timeit tree.packing(np.asarray((lon, lat, alt)).T) 3.84 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)coordinates = np.asarray(( np.random.uniform(-180.0, 180.0, 10000), np.random.uniform(-90.0, 90.0, 10000), np.random.uniform(-10000, 100000, 10000))).T %timeit tree.query(coordinates) 18 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)``` I'm trying to use these indexes with Xarray, but I didn't quite understand how to interface with xarray. Is there anyone who could explain to me how to write my own indexer to test these indexers with xarray? Thank you in advance. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2986/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1603957501 | I_kwDOAMm_X85fmnL9 | 7573 | Add optional min versions to conda-forge recipe (`run_constrained`) | dcherian 2448579 | closed | 0 | 4 | 2023-02-28T23:12:15Z | 2023-08-21T16:12:34Z | 2023-08-21T16:12:21Z | MEMBER | Is your feature request related to a problem?I opened this PR to add minimum versions for our optional dependencies: https://github.com/conda-forge/xarray-feedstock/pull/84/files to prevent issues like #7467 I think we'd need a policy to choose which ones to list. Here's the current list:
Some examples to think about:
1. Describe the solution you'd likeNo response Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7573/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1845132891 | I_kwDOAMm_X85t-n5b | 8062 | Dataset.chunk() does not overwrite encoding["chunks"] | Metamess 2466330 | open | 0 | 4 | 2023-08-10T12:54:12Z | 2023-08-14T18:23:36Z | CONTRIBUTOR | What happened?When using the Looking at the implementation of I do not know why this default value was chosen as False, or what could break if it was changed to True, but looking at the documentation, it seems the opposite of the intended effect. From the documentation of
Which is exactly what it doesn't. What did you expect to happen?I would expect the "chunks" entry of the Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np Create a test Dataset with dimension x and y, each of size 100, and a chunksize of 50ds_original = xr.Dataset({"my_var": (["x", "y"], np.random.randn(100, 100))}) Since 'chunk' does not work, manually set encodingds_original .my_var.encoding["chunks"] = (50, 50) To best showcase the real-life example, write it to file and read it back again.The same could be achieved by just calling .chunk() with chunksizes of 25, but this feels more 'complete'filepath = "~/chunk_test.zarr" ds_original.to_zarr(filepath) ds = xr.open_zarr(filepath) Check the chunksizes and "chunks" encodingprint(ds.my_var.chunks) >>> ((50, 50), (50, 50))print(ds.my_var.encoding["chunks"]) >>> (50, 50)Rechunk the Datasetds = ds.chunk({"x": 25, "y": 25}) The chunksizes have changedprint(ds.my_var.chunks) >>> ((25, 25, 25, 25), (25, 25, 25, 25))But the encoding value remains the sameprint(ds.my_var.encoding["chunks"]) >>> (50, 50)Attempting to write this back to zarr raises an errords.to_zarr("~/chunk_test_rechunked.zarr") NotImplementedError: Specified zarr chunks encoding['chunks']=(50, 50) for variable named 'my_var' would overlap multiple dask chunks ((25, 25, 25, 25), (25, 25, 25, 25)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8062/reactions",
"total_count": 2,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 1
} |
xarray 13221727 | issue | ||||||||
| 1845508562 | I_kwDOAMm_X85uADnS | 8065 | .mfdataset fail to open a kerchunked zarr file from an object-store bucket | pl-marasco 22492773 | closed | 0 | 4 | 2023-08-10T16:22:05Z | 2023-08-14T14:18:17Z | 2023-08-14T14:13:58Z | NONE | What happened?Trying to open a kerchunk .json through the open_mfdata a ValueError is raised. What did you expect to happen?should be open a Dataset as described here below:
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?Seems to be related to zarr's version: if tested with <= 2.12 it works but with the latest versions > 2.12 it doesn't. Environment
xarray version 2023.7.0
zarr >2.12
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8065/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1817880272 | I_kwDOAMm_X85sWqbQ | 8013 | np.cumproduct deprecated | quantsnus 25102059 | closed | 0 | 4 | 2023-07-24T08:11:01Z | 2023-07-31T16:46:00Z | 2023-07-31T16:46:00Z | CONTRIBUTOR | What is your issue?Since numpy version 1.25.0 The coordinates to_index() method still uses it https://github.com/pydata/xarray/blob/971be103d6376d6572d1f12d32526f12f07ae2c7/xarray/core/coordinates.py#L144 which results in an unecessary DeprecationWarning. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/8013/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1789989152 | I_kwDOAMm_X85qsREg | 7962 | Better chunk manager error | dcherian 2448579 | closed | 0 | 4 | 2023-07-05T17:27:25Z | 2023-07-24T22:26:14Z | 2023-07-24T22:26:13Z | MEMBER | What happened?I just ran in to this error in an environment without dask.
I think we could easily recommend the user to install a package that provides |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7962/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1752520008 | I_kwDOAMm_X85odVVI | 7907 | `plot.scatter(hue_style="discrete")` does nothing | mgunyho 20118130 | closed | 0 | 4 | 2023-06-12T11:21:33Z | 2023-07-13T23:17:49Z | 2023-07-13T23:17:49Z | CONTRIBUTOR | What happened?I was trying to do a scatterplot of my data with one dimension determining the color. The dimension has only a few values so I used What did you expect to happen?The colorbar should have discrete colors. I was also expecting the colors to be from the default matplotlib color palette, C0, C1, etc, when there's less than 10 items, like this: Although the examples in the documentation show the discrete case also using viridis. What I was really expecting is a plot like one would get by passing But that may be a bit too automagical. Minimal Complete Verifiable Example```Python import matplotlib.pyplot as plt import numpy as np import xarray as xr x = xr.DataArray( np.random.default_rng().random((10, 3)), coords=[ ("idx", np.linspace(0, 1, 10)), ("color", [1, 2, 3]), ] ) y = x + np.random.default_rng().random(x.shape) ds = xr.Dataset({ "x": x, "y": y, }) the output is the same regardless of hue_style="discrete" or "continuous" or just leaving it outds.plot.scatter(x="x", y="y", hue="color", hue_style="discrete", ax=plt.figure().gca()) ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?This is the code for the "expected" plot: ```python from matplotlib.colors import ListedColormap ds.plot.scatter( x="x", y="y", hue="color", hue_style="discrete", ax=plt.figure().gca(),
) ``` Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.14.0-1059-oem
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.1.0
pandas: 1.4.3
numpy: 1.23.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 44.0.0
pip: 20.0.2
conda: None
pytest: None
mypy: None
IPython: 8.12.2
sphinx: None
I also tried this on main at 3459e6fa, the behavior is the same. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7907/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1775657305 | I_kwDOAMm_X85p1mFZ | 7945 | engine='cfgrib' no longer an option in xr.open_dataset() but works anyway | parsellsx 74011857 | closed | 0 | 4 | 2023-06-26T21:32:01Z | 2023-06-27T00:06:27Z | 2023-06-26T21:37:05Z | NONE | What is your issue?Looking at the documentation for xr.open_dataset(), the "engine" argument to that function is listed as accepting one of 7 different engines (or None), but the "cfgrib" engine is not among them. Looking at older versions of the documentation, I see that "cfgrib" was delisted starting with v2023.04.0 (it's still present in v2023.03.0). In what I think is a related issue, this tutorial on reading in ERA5 GRIB files with the "engine='cfgrib'" option on xr.load_dataset() gives a ValueError in documentation versions starting with v2023.04.0 and going through v2023.05.0 and 'stable' due to the unrecognized engine 'cfgrib', although it seems to have been fixed for v2023.06.0 and 'latest'. Given both of the above, I was surprised to find that using xr.open_dataset() on a GRIB file with engine='cfgrib' does work for me using xarray v2023.05.0. To me it seems that the documentation for xr.open_dataset() should be edited to include the 'cfgrib' option again, but I'd like to get an opinion from someone more familiar with xarray. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7945/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1718143526 | I_kwDOAMm_X85maMom | 7854 | Freezing Issue When Accessing Precipitation Values with xarray | yanivgolds 118670091 | closed | 0 | 4 | 2023-05-20T11:30:54Z | 2023-06-26T15:33:19Z | 2023-06-26T15:33:19Z | NONE | What is your issue?I am encountering a freezing issue in my project that utilizes xarray when trying to access precipitation values for a specific longitude-latitude position over a time period. This issue occurs on the slurm system but is not reproduced on my Jupyter Notebook setup. As a result, whenever I attempt to run the project, the job freezes. I would greatly appreciate your assistance in determining the cause of this problem. Below is a figure showing the result from Jupyer Notebook (this works): |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7854/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1691902604 | I_kwDOAMm_X85k2GKM | 7805 | [FR] add support for rss and rss button to xarray blog | danieltomasz 7980381 | closed | 0 | 4 | 2023-05-02T07:15:12Z | 2023-06-21T21:10:32Z | 2023-06-21T21:10:32Z | NONE | Is your feature request related to a problem?A easy way to subscribe to news from xarray blog Describe the solution you'd likeA support for publishing news and button to subscribe to rss from blog (along twitter icon etcera) Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7805/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1760733017 | I_kwDOAMm_X85o8qdZ | 7924 | Migrate from nbsphinx to myst, myst-nb | dcherian 2448579 | open | 0 | 4 | 2023-06-16T14:17:41Z | 2023-06-20T22:07:42Z | MEMBER | Is your feature request related to a problem?I think we should switch to MyST markdown for our docs. I've been using MyST markdown and MyST-NB in docs in other projects and it works quite well. Advantages: 1. We get HTML reprs in the docs (example) which is a big improvement. (#6620) 2. I think many find markdown a lot easier to write than RST There's a tool to migrate RST to MyST (RTD's migration guide). Describe the solution you'd likeNo response Describe alternatives you've consideredNo response Additional contextNo response |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7924/reactions",
"total_count": 5,
"+1": 4,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 1,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1722614979 | I_kwDOAMm_X85mrQTD | 7870 | Name collision with Pulsar Timing package 'PINT' | vhaasteren 3092444 | closed | 0 | 4 | 2023-05-23T18:54:18Z | 2023-05-26T16:19:37Z | 2023-05-26T16:19:37Z | CONTRIBUTOR | What is your issue?In the astrophysics community of pulsar timers, there is an analysis package called However, Bayesian modeling through PyMC is becoming more and more popular, meaning that arviz and xarray are now getting installed alongside pint-pulsar, giving obvious issues. A very simple workaround would be to change line 37 in https://github.com/pydata/xarray/blob/main/xarray/core/pycompat.py to something like:
This means that EDIT: fixed typo |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7870/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1160309381 | I_kwDOAMm_X85FKOqF | 6335 | ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. | morestart 35556811 | closed | 0 | 4 | 2022-03-05T10:26:49Z | 2023-05-12T14:09:52Z | 2022-03-05T10:28:29Z | NONE | What is your issue?ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. Consider explicitly selecting one of the installed engines via the but i installed nedCDF4 use pip install netCDF4 |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6335/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1517575123 | I_kwDOAMm_X85adFvT | 7409 | Implement `DataArray.to_dask_dataframe()` | gcaria 44147817 | closed | 0 | 4 | 2023-01-03T15:44:11Z | 2023-04-28T15:09:31Z | 2023-04-28T15:09:31Z | CONTRIBUTOR | Is your feature request related to a problem?It'd be nice to pass from a chunked DataArray to a dask object directly Describe the solution you'd likeI think something along these lines should work (although a less convoluted way might exist): ```python import dask.dataframe as dkd import xarray as xr def to_dask(da: xr.DataArray) -> Union[dkd.Series, dkd.DataFrame]:
``` |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7409/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1652227927 | I_kwDOAMm_X85iev9X | 7713 | `Variable/IndexVariable` do not accept a tuple for data. | zoj613 44142765 | closed | 0 | 4 | 2023-04-03T14:50:58Z | 2023-04-28T14:26:37Z | 2023-04-28T14:26:37Z | NONE | What happened?It appears that What did you expect to happen?Successful instantiation of a Minimal Complete Verifiable Example```Python import xarray as xr xr.Variable(data=(2, 3, 45), dims="day") ``` MVCE confirmation
Relevant log output
Anything else we need to know?This error seems to be triggered by the Environment
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:55)
[GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 6.1.21-1-lts
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.2
distributed: 2023.3.2
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: 0.14.0
flox: None
numpy_groupies: None
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: 1.1.1
IPython: 8.12.0
sphinx: None
```
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7713/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 575939446 | MDU6SXNzdWU1NzU5Mzk0NDY= | 3830 | Documentation request: add examples for carrying out "ncecat" in xarray | lukelbd 19657652 | open | 0 | 4 | 2020-03-05T01:58:17Z | 2023-04-13T20:06:20Z | NONE | In climate science, a very common task involves concatenating NetCDF files with identical variables, dimensions, and coordinates along a brand new "ensemble member" or "record" dimension. With the NetCDF Operators, this is accomplished using MCVE Code SampleCurrently, it seems the correct way to do this in xarray is with
Problem DescriptionWhile this works, there does not seem to be any mention of this use case in the It would be nice to have examples in
Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/3830/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1659786592 | I_kwDOAMm_X85i7lVg | 7742 | About save char into netcdf | ChristmasZCY 61818189 | closed | 0 | 4 | 2023-04-09T07:49:50Z | 2023-04-11T06:36:27Z | 2023-04-11T06:36:27Z | NONE | What is your issue?When I want to save char into netcdf, it will produce a new dimension. However I read this netcdf file with xarray, it can't find anything with this dimension.
∫
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7742/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1419825696 | I_kwDOAMm_X85UoNIg | 7199 | Deprecate cfgrib backend | headtr1ck 43316012 | closed | 0 | 4 | 2022-10-23T15:09:14Z | 2023-03-29T15:19:53Z | 2023-03-29T15:19:53Z | COLLABORATOR | What is your issue?Since cfgrib 0.9.9 (04/2021) it comes with its own xarray backend plugin (looks mainly like a copy of our internal version). We should deprecate our internal plugin. The deprecation is complicated since we usually bind the minimum version to a minor step, but cfgrib seems to be on 0.9 since 4 years already. Maybye an exception like for netCDF4? Anyway, if we decide to leave it as it is for now, this ticket is just a reminder to remove it someday :) |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7199/reactions",
"total_count": 4,
"+1": 4,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1620573171 | I_kwDOAMm_X85gl_vz | 7617 | The documentation contains some non-descriptive link texts. | remigathoni 51911758 | closed | 0 | 4 | 2023-03-13T00:34:09Z | 2023-03-27T21:37:21Z | 2023-03-27T21:37:20Z | CONTRIBUTOR | What is your issue?I've been going through the docs and noticed some links could be more descriptive. Here are a few examples with options on how we could rewrite them: - See the user guide for more. -> Check out the indexing section in the user guide for a detailed explanation. - For more, see the Xarray documentation. -> See the documentation on automatic alignment to learn more. - This tutorial notebook also covers alignment and broadcasting (highly recommended)-> You can also check out this tutorial notebook on alignment and broadcasting (highly recommended). - For more see the user guide, the gallery, and the tutorial material. -> For more information, check out the following resources: * The plotting documentation in the user guide. * The visualization gallery. * The plotting and visualization tutorial materials. With more specific link texts, you get a clearer idea of what to expect when you click on the link which improves the reading experience. It also makes the links more accessible. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7617/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 928381010 | MDU6SXNzdWU5MjgzODEwMTA= | 5515 | NetCDF: Attempting netcdf-4 operation on netcdf-3 file | mickaellalande 20254164 | open | 0 | 4 | 2021-06-23T15:23:55Z | 2023-03-27T21:07:32Z | CONTRIBUTOR | I'm trying to open MODIS .hdf files, but I get the error : ```python import xarray as xr xr.open_dataset('MOD10C1.A2000055.061.2020037182124.hdf') RuntimeError: NetCDF: Attempting netcdf-4 operation on netcdf-3 file ``` I already opened hdf files from another product without any issue... (https://nsidc.org/data/MOD10CM) Here are two examples, with one that works and the other one that causes the issue: MODIS.zip Thanks in advance for your help! Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:25:15) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.0 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.5 cfgrib: 0.9.8.5 iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None pint: None setuptools: 49.2.0.post20200712 pip: 20.2 conda: None pytest: 6.0.0 IPython: 7.16.1 sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/5515/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1338173609 | I_kwDOAMm_X85Pwuip | 6914 | plt.imshow() vs xarray_dataset.plot.imshow() not rendering correctly | Potential Bug | melioristic 32569566 | closed | 0 | 4 | 2022-08-14T08:40:56Z | 2023-03-22T20:46:23Z | 2023-03-22T20:46:23Z | NONE | What is your issue?I have 2d data which I want to visualise. The visuals look completely different if I use plt.imshow() vs xarray_dataset.plot.imshow() There are mainly two issues - First, the array is flipped. (I think this is manageable but inconsistent) - Secondly, the plots don't look correct. This can be best illustrated by the figures themselves. For example this is the xarray code I am using.
And this is the image that I get.
Secondly, when I use the matplotlib to plot the values.
Since it is a discharge data I would expect to see the second plot. Can someone tell me what is the issue here? P.S. This is how day_data looks like.
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/6914/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1499473190 | I_kwDOAMm_X85ZYCUm | 7385 | Unexpected NaNs in broadcast | dopplershift 221526 | open | 0 | 4 | 2022-12-16T02:42:44Z | 2023-03-14T20:43:00Z | CONTRIBUTOR | What happened?When running the What did you expect to happen?No response Minimal Complete Verifiable Example```Python levs = np.array([100000, 85000]) a = xr.Dataset({'a': (('lev',), [1, 2])}, coords={'lev': levs}).to_array() b = xr.Dataset({'b': (('lev',), [3, 4])}, coords={'lev': levs}).to_array() broad_a, broad_b = xr.broadcast(a, b) print(broad_a) ``` MVCE confirmation
Relevant log output
Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.12.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.5
dask: 2022.6.1
distributed: 2022.6.1
matplotlib: 3.6.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
mypy: 0.991
IPython: 8.7.0
sphinx: 5.3.0
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7385/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 706507153 | MDU6SXNzdWU3MDY1MDcxNTM= | 4449 | Did copy(deep=True) break with 0.16.1? | blaylockbk 6249613 | closed | 0 | 4 | 2020-09-22T15:59:41Z | 2023-03-12T21:08:42Z | 2023-03-12T21:08:42Z | NONE | What happened: I have a script that downloads a file, reads and copies it to memory with What you expected to happen: In 0.16.0 and earlier, the variable data is available ( Minimal Complete Verifiable Example: ```python import xarray as xr import os import urllib.request Get sample NetCDF fileurl = 'https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc' FILE = 'tos_O1_2001-2002.nc' urllib.request.urlretrieve(url, FILE) Open the NetCDF fileds1 = xr.open_dataset(FILE) Make a copy of the Datasetds2 = ds1.copy(deep=True) and close the originalds1.close() remove the NetCDF fileos.remove(FILE) Read the copied datasetds2 ``` Anything else we need to know?:
Output for xarray v0.16.0
Output for xarray v0.16.1
Environment: Output of <tt>xr.show_versions()</tt> for xarray 0.16.0INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: NoneOutput of <tt>xr.show_versions()</tt> for xarray 0.16.1INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: Nonexarray: 0.16.0 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: None |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/4449/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1598266728 | I_kwDOAMm_X85fQ51o | 7556 | broken documentation link | arfriedman 76110149 | closed | 0 | 4 | 2023-02-24T09:37:57Z | 2023-03-12T18:02:59Z | 2023-03-12T18:02:59Z | CONTRIBUTOR | What is your issue?Hi, I found this broken link at the bottom of the Datetime Indexing subsection in the User Guide. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7556/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1468838643 | I_kwDOAMm_X85XjLLz | 7336 | Instability when calculating standard deviation | ShihengDuan 26401994 | closed | 0 | 4 | 2022-11-29T23:33:55Z | 2023-03-10T20:32:51Z | 2023-03-10T20:32:50Z | NONE | What happened?I noticed that for some large values (not really that large) and lots of samples, the So I guess this is related to the magnitude, but not sure. Anyone has similar issue? What did you expect to happen?Adding or subtracting a constant should not change the standard deviation.
See screenshot here about what the data look like:
Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.71.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.6.0
pandas: 1.4.4
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.0
distributed: 2022.9.0
matplotlib: 3.5.2
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.2.2
conda: None
pytest: None
IPython: 8.6.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7336/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 1588461863 | I_kwDOAMm_X85ergEn | 7539 | Concat doesn't concatenate dimension coordinates along new dims | TomNicholas 35968931 | open | 0 | 4 | 2023-02-16T22:32:33Z | 2023-02-21T19:07:48Z | MEMBER | What is your issue?
Take this example (motivated by https://github.com/pydata/xarray/discussions/7532#discussioncomment-4988792)
Coordinates: * time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432 * cols (cols) <U4 'col1' 'col2' Dimensions without coordinates: new ``` I would have expected to get a result of size Instead what happened is that This is kind of briefly mentioned in the concat docstring under I don't really know what I would prefer to happen with the coordinates. I guess to have created a At the very least we should make this a lot clearer in the docs. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7539/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 1470583016 | I_kwDOAMm_X85Xp1Do | 7340 | xr.corr produces incorrect output for complex arrays | mattragoza 7647340 | closed | 0 | 4 | 2022-12-01T03:00:09Z | 2023-02-14T16:38:29Z | 2023-02-14T16:38:29Z | NONE | What happened?I create a DataArray full of complex numbers, and I compute the correlation of the DataArray with itself. What did you expect to happen?The absolute value of the correlation coefficient should be equal to 1, up to numerical precision. However, this is not the case. The returned correlation coefficient is around 0.26 and change depending on the number of values in the array. Minimal Complete Verifiable Example```Python import xarray as xr array = xr.DataArray([ -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j, -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j, 0.00000000e+00+0.00000000e+00j, -2.42585590e-02+1.42052459e-02j, -5.53404148e-03+4.60188062e-03j, -4.68829482e-03+4.90179019e-03j, -7.02331258e-03+8.75908673e-03j, -1.31233383e-01+1.86572484e-01j, -4.05137401e-03+6.59972035e-03j, -4.20701822e-03+7.29813816e-03j, -3.56487231e-03+6.51759430e-03j, -3.68077200e-03+7.04388575e-03j, -8.16459981e-02+1.70084145e-01j, -5.11737898e-03+1.98164995e-02j, 6.72772914e-04-7.28110367e-05j, 2.13957504e-03-1.82525995e-03j, 1.60369835e-03-1.54029189e-03j, 8.77788719e-02-8.45568854e-02j, 1.04277417e-01-9.38854749e-02j, 7.58465696e-03-6.07906563e-03j, 8.00776452e-03-5.70470615e-03j, 8.36166252e-03-5.14978313e-03j, 0.00000000e+00+0.00000000e+00j, 0.00000000e+00+0.00000000e+00j, 0.00000000e+00+0.00000000e+00j, 7.26422461e-03+4.40382166e-04j, 4.01364547e-03+1.09269127e-03j, -1.99069471e-01-1.20355081e-01j, 1.56511579e-01+2.59839758e-01j, 9.14046953e-04+5.42262898e-03j, -8.37800782e-04+5.67555708e-03j, -3.36561822e-03+7.50108018e-03j, -4.22682090e-03+5.36279242e-03j, 5.95438564e-02-3.48209841e-02j, -6.77184281e-03+2.10711488e-03j, -4.84293269e-03+3.78698499e-04j, -5.13547723e-03-6.86765713e-04j, 4.48392070e-01+1.54568226e-01j, -3.17412047e-01-2.35431216e-01j, -2.95731737e-03-3.39078899e-03j, -1.95111443e-03-3.77545168e-03j, -2.82719903e-04-1.61393513e-03j, 7.20241467e-04-1.73515565e-03j, -1.96675563e-01-4.42259734e-02j, 0.00000000e+00+0.00000000e+00j, 4.84813452e-03+7.60742077e-03j, 6.31707602e-03+1.51808252e-02j, 2.99277774e-03+1.18667410e-02j, 5.64640060e-04+1.58372118e-02j, -1.74137347e-03+1.70383706e-02j, -5.91398408e-03+2.30008930e-02j, -7.12027831e-03+1.87732435e-02j, 9.30919156e-02-1.65255887e-01j, -2.09716130e-01+2.30490479e-01j, -1.80115101e-02+1.37248240e-02j, -1.85851718e-02+9.23420957e-03j, -1.88459965e-02+5.12854226e-03j, 1.09175874e+00-9.17875627e-02j, -1.63766142e-02-5.32431671e-03j, -1.24749963e-02-9.63714407e-03j, -7.58657222e-03-1.27728267e-02j, -1.99052439e-03-1.35879033e-02j, -5.70595470e-01+2.27742231e+00j, 1.24516564e-02-1.21867738e-02j, 1.82174257e-02-8.67884733e-03j, 2.27204879e-02-3.77097224e-03j, 2.66143091e-02+2.68683768e-03j, 1.06983372e+00+3.19301893e-01j, -6.86033738e-01-4.72910865e-01j, 3.00291320e-02+3.10297521e-02j, 2.22880055e-02+3.45332319e-02j, 1.61724440e-02+4.04122368e-02j, 9.78881043e-03+4.96053678e-02j, -6.51085120e-03+5.27227722e-02j, -1.76752380e-02+5.26095806e-02j, -3.81856382e-02+6.41735764e-02j, 0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j ]) r = np.abs(xr.corr(array, array).item()) assert np.isclose(r, 1.0), r ``` MVCE confirmation
Relevant log output```Python The exact output I get for the self-contained example below is: AssertionError Traceback (most recent call last) Cell In [44], line 46 3 array = xr.DataArray([ 4 -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j, 5 -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j, (...) 43 0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j 44 ]) 45 r = np.abs(xr.corr(array, array).item()) ---> 46 assert np.isclose(r, 1.0), r AssertionError: 0.2664911388214005
Anything else we need to know?Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] Xarray version is '2022.9.0' Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-193.28.1.el8_2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.11.0
distributed: None
matplotlib: 3.6.2
cartopy: None
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: None
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7340/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[number] INTEGER,
[title] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[state] TEXT,
[locked] INTEGER,
[assignee] INTEGER REFERENCES [users]([id]),
[milestone] INTEGER REFERENCES [milestones]([id]),
[comments] INTEGER,
[created_at] TEXT,
[updated_at] TEXT,
[closed_at] TEXT,
[author_association] TEXT,
[active_lock_reason] TEXT,
[draft] INTEGER,
[pull_request] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[state_reason] TEXT,
[repo] INTEGER REFERENCES [repos]([id]),
[type] TEXT
);
CREATE INDEX [idx_issues_repo]
ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
ON [issues] ([user]);







