home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

11 rows where user = 8382834 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 10
  • pull 1

state 2

  • closed 7
  • open 4

repo 1

  • xarray 11
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1951543761 I_kwDOAMm_X850UjHR 8335 ```DataArray.sel``` can silently pick up the nearest point, even if it is far away and the query is out of bounds jerabaul29 8382834 open 0     13 2023-10-19T08:02:44Z 2024-04-29T23:02:31Z   CONTRIBUTOR      

What is your issue?

@paulina-t (who found a bug caused by the behavior we report here in a codebase, where it was badly messing things up).

See the example notebook at https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_10_18/interp.ipynb .


Problem

It is always a bit risky to interpolate / find the nearest neighbor to a query or similar, as bad things can happen if querying a value for a point that is outside of the area that is represented. Fortunately, xarray returns NaN if performing interp outside of the bounds of a dataset:

```python import xarray as xr import numpy as np

xr.version

'2023.9.0'

data = np.array([[1, 2, 3], [4, 5, 6]]) lat = [10, 20] lon = [120, 130, 140]

data_xr = xr.DataArray(data, coords={'lat':lat, 'lon':lon}, dims=['lat', 'lon'])

data_xr

<xarray.DataArray (lat: 2, lon: 3)> array([[1, 2, 3], [4, 5, 6]]) Coordinates: * lat (lat) int64 10 20 * lon (lon) int64 120 130 140

interp is civilized: rather than wildly extrapolating, it returns NaN

data_xr.interp(lat=15, lon=125)

<xarray.DataArray ()> array(3.) Coordinates: lat int64 15 lon int64 125

data_xr.interp(lat=5, lon=125)

<xarray.DataArray ()> array(nan) Coordinates: lat int64 5 lon int64 125 ```

Unfortunately, .sel will happily find the nearest neighbor of a point, even if the input point is outside of the dataset range:

```python

sel is not as civilized: it happily finds the neares neighbor, even if it is "on the one side" of the example data

data_xr.sel(lat=5, lon=125, method='nearest')

<xarray.DataArray ()> array(2) Coordinates: lat int64 10 lon int64 130 ```

This can easily cause tricky bugs.


Discussion

Would it be possible for .sel to have a behavior that makes the user aware of such issues? I.e. either:

  • print a warning on stderr
  • return NaN
  • raise an exception

when performing a .sel query that is outside of a dataset range / not in between of 2 dataset points?

I understand that finding the nearest neighbor may still be useful / wanted in some cases even when being outside of the bounds of the dataset, but the fact that this happens silently by default has been causing bugs for us. Could either this default behavior be changed, or maybe enabled with a flag (allow_extrapolate=False by default for example, so users can consciously opt it in)?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8335/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2098252325 I_kwDOAMm_X859EMol 8653 xarray v 2023.9.0: ```ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects``` jerabaul29 8382834 open 0     1 2024-01-24T13:18:55Z 2024-02-05T12:50:34Z   CONTRIBUTOR      

What happened?

I tried to save an xarray dataset with datetimes as data for its time dimension to a nc file with to_netcdf and got the error ValueError: unable to infer dtype on variable 'time'; xarray cannot serialize arbitrary Python objects.

What did you expect to happen?

I expected xarray to automatically detect these were datetimes, and convert them to whatever format xarray likes to work with internally to dump it into a CF compatible file, following what is described at https://github.com/pydata/xarray/issues/2512 .

Minimal Complete Verifiable Example

```Python import xarray as xr import datetime

times = [datetime.datetime(2024, 1, 1, 1, 1, 1, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 1, 1, 1, 2, tzinfo=datetime.timezone.utc)]

data = [1, 2]

xr_result = xr.Dataset( { 'time': xr.DataArray(dims=["time"], data=times, attrs={ "standard_name": "time", }), # 'data': xr.DataArray(dims=["time"], data=data, attrs={ "_FillValue": "NaN", "standard_name": "some_data", }), } )

xr_result.to_netcdf("test.nc") ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

The example is available as a notebook viewable at:

https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2024_01_24/xarray_and_datetimes.ipynb

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 6.5.0-14-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2023.9.0 pandas: 2.0.3 numpy: 1.25.2 scipy: 1.11.3 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.5 dask: 2023.9.2 distributed: 2023.9.2 matplotlib: 3.7.2 cartopy: 0.21.1 seaborn: 0.13.0 numbagg: None fsspec: 2023.9.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.15.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8653/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1856285332 PR_kwDOAMm_X85YOeC_ 8083 Document drop_variables in open_mfdataset jerabaul29 8382834 closed 0     3 2023-08-18T08:22:30Z 2023-08-31T12:41:07Z 2023-08-31T12:41:07Z CONTRIBUTOR   0 pydata/xarray/pulls/8083

Document drop_variables in open_mfdataset. Makes it even clearer that for more information about possible additional options to open_mfdataset, one can consult open_dataset. Solves the comment at https://github.com/pydata/xarray/issues/8074#issuecomment-1682398773 .

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8083/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1858791026 I_kwDOAMm_X85uyuZy 8093 additional option for ```dataset.plot.quiver``` : provide control over the arrows density through the native ```xarray``` quiver API jerabaul29 8382834 closed 0     8 2023-08-21T07:39:48Z 2023-08-22T13:12:19Z 2023-08-22T13:12:19Z CONTRIBUTOR      

Is your feature request related to a problem?

When plotting a dataset using .plot.quiver, I usually end up having far too many, too small arrows. Looking both at the documentation of the xarray quiver API ( https://xarray.pydata.org/en/v0.17.0/generated/xarray.Dataset.plot.quiver.html ), and at the documentation of the matplotlib quiver API (that is "reachable" from the xarray quiver API through the **kwargs forwarding: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.quiver.html ), I see ways to modify the arrows aspect and plotting properties, but not their densities. Adding ways to control it would be neat. This could be done at the xarray level (with the advantage that more "context knowledge" is available for being able to do it well), or at the matplotlib level (not sure if this is as well suited).

Describe the solution you'd like

I would like to have some options to control the density of my quiver plot arrows directly from the xarray quiver function api. Something like (but open to better suggestions), skip=(skip_x, skip_y) to say how many of the actual data points to plot, or something like density=density to tell how much arrow density I would like on the figure.

Describe alternatives you've considered

After looking into this in a bit of details, if I did not miss anything, it looks like the only option is to directly quiver plot with matplotlib, slicing / skipping by hand the input data: see for example https://stackoverflow.com/questions/33576572/python-quiver-options and similar options. This defeats the point of being able to call dataset.plot.quiver of course, as this forces doing quite a bit of manual operation and falling back to getting the data into numpy arrays and plotting them directly in matplotlib.

I also tried to check if it was possible to directly downsample the xarray dataset and then plot it, by calling a few reshaping (https://docs.xarray.dev/en/stable/user-guide/reshaping.html) commands first, i.e. something like:

my_df.sel(time=time_to_plot).coarsen(rlat=100, rlon=100).plot.quiver(x="longitude", y="latitude", u="data_u", v="data_v")

but that results in an error:

AttributeError: 'DatasetCoarsen' object has no attribute 'plot'

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8093/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1853356670 I_kwDOAMm_X85ud_p- 8074 Add an ```only_variables``` or similar option to ```xarray.open_dataset``` and ```xarray.open_mfdataset``` jerabaul29 8382834 open 0     7 2023-08-16T14:23:43Z 2023-08-21T06:55:17Z   CONTRIBUTOR      

Is your feature request related to a problem?

Sometimes, a variable in a nc file is corrupted or not "xarray friendly" and crashes opening a file (see for example https://github.com/pydata/xarray/issues/8072 ; I solved this on my machine by just drop_variablesing the problematic variables in practice), or reading and parsing the full file or mf-file may be expensive and time consuming, while only a couple of variables are needed.

Describe the solution you'd like

We already can exclude variables with the drop_variables arg to open_dataset (note: this is not present for now in open_mfdataset, should it be added there?), but could we also instead of saying "read all the variables instead of this list", be able to say "read only these variables"? In most case, this would be equivalent of using drop_variables=list(set(all_vars)-set(list_insteresting_vars), but in case some (many vars) may be corrupted, just getting the file opened to list these all_vars may be problematic.

Describe alternatives you've considered

drop_variables=list(set(all_vars)-set(list_insteresting_vars), but this is a lot more verbose.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8074/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1851388168 I_kwDOAMm_X85uWfEI 8072 RuntimeError: NetCDF: Invalid dimension ID or name jerabaul29 8382834 closed 0     7 2023-08-15T12:50:43Z 2023-08-20T03:41:24Z 2023-08-20T03:41:23Z CONTRIBUTOR      

What happened?

I got a long error message and the RuntimeError: NetCDF: Invalid dimension ID or name error when trying to open a nc file with xarray. I did a bit of digging around, but could not find a workaround (the issues with similar problems I found were old and supposed to be solved).

What did you expect to happen?

No response

Minimal Complete Verifiable Example

See the jupyter notebook: https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_08_15/illustrate_issue_xr.ipynb .

Copying the commands (note that it should be run as a notebook):

```Python import os import xarray as xr xr.show_versions()

get the file

!wget https://arcticdata.io/metacat/d1/mn/v2/object/urn%3Auuid%3Ad5f179a3-76a8-4e4f-b45f-8e8d85960ba6

rename the file

!mv urn\:uuid\:d5f179a3-76a8-4e4f-b45f-8e8d85960ba6 bar_BSO3_a_rcm7_2008_09.nc

the file is not xarray-friendly, modify and rename

!ncrename -v six_con,six_con_dim bar_BSO3_a_rcm7_2008_09.nc bar_BSO3_a_rcm7_2008_09_xr.nc xr.open_dataset("./bar_BSO3_a_rcm7_2008_09_xr.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

KeyError Traceback (most recent call last) File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 210 try: --> 211 file = self._cache[self._key] 212 except KeyError:

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.getitem(self, key) 55 with self._lock: ---> 56 value = self._cache[key] 57 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/jrmet/Desktop/Git/public_bug_reports/xarray/2023_08_15/bar_BSO3_a_rcm7_2008_09_xr.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'b771184f-a474-4093-b5f1-784542ef9ce6']

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last) Cell In[6], line 1 ----> 1 xr.open_dataset("./bar_BSO3_a_rcm7_2008_09_xr.nc")

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/api.py:570, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, kwargs) 558 decoders = _resolve_decoders_kwargs( 559 decode_cf, 560 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 566 decode_coords=decode_coords, 567 ) 569 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 570 backend_ds = backend.open_dataset( 571 filename_or_obj, 572 drop_variables=drop_variables, 573 decoders, 574 kwargs, 575 ) 576 ds = _dataset_from_backend_dataset( 577 backend_ds, 578 filename_or_obj, (...) 588 kwargs, 589 ) 590 return ds

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:602, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose) 581 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 582 self, 583 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 599 autoclose=False, 600 ) -> Dataset: 601 filename_or_obj = _normalize_path(filename_or_obj) --> 602 store = NetCDF4DataStore.open( 603 filename_or_obj, 604 mode=mode, 605 format=format, 606 group=group, 607 clobber=clobber, 608 diskless=diskless, 609 persist=persist, 610 lock=lock, 611 autoclose=autoclose, 612 ) 614 store_entrypoint = StoreBackendEntrypoint() 615 with close_on_error(store):

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:400, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 394 kwargs = dict( 395 clobber=clobber, diskless=diskless, persist=persist, format=format 396 ) 397 manager = CachingFileManager( 398 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 399 ) --> 400 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:347, in NetCDF4DataStore.init(self, manager, group, mode, lock, autoclose) 345 self._group = group 346 self._mode = mode --> 347 self.format = self.ds.data_model 348 self._filename = self.ds.filepath() 349 self.is_remote = is_remote_uri(self._filename)

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:409, in NetCDF4DataStore.ds(self) 407 @property 408 def ds(self): --> 409 return self._acquire()

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:403, in NetCDF4DataStore._acquire(self, needs_lock) 402 def _acquire(self, needs_lock=True): --> 403 with self._manager.acquire_context(needs_lock) as root: 404 ds = _nc4_require_group(root, self._group, self._mode) 405 return ds

File ~/miniconda3/envs/pytmd/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.enter(self) 135 del self.args, self.kwds, self.func 136 try: --> 137 return next(self.gen) 138 except StopIteration: 139 raise RuntimeError("generator didn't yield") from None

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock) 196 @contextlib.contextmanager 197 def acquire_context(self, needs_lock=True): 198 """Context manager for acquiring a file.""" --> 199 file, cached = self._acquire_with_cache_info(needs_lock) 200 try: 201 yield file

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock) 215 kwargs = kwargs.copy() 216 kwargs["mode"] = self._mode --> 217 file = self._opener(self._args, *kwargs) 218 if self._mode == "w": 219 # ensure file doesn't get overridden when opened again 220 self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2485, in netCDF4._netCDF4.Dataset.init()

File src/netCDF4/_netCDF4.pyx:1863, in netCDF4._netCDF4._get_dims()

File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Invalid dimension ID or name

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-78-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.1 libnetcdf: 4.9.2 xarray: 2023.7.0 pandas: 2.0.3 numpy: 1.25.2 scipy: 1.11.1 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: 3.9.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.7.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: None mypy: None IPython: 8.14.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8072/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1685503657 I_kwDOAMm_X85kdr6p 7789 Cannot access zarr data on Azure using shared access signatures (SAS) jerabaul29 8382834 closed 0     1 2023-04-26T18:21:08Z 2023-04-26T18:32:33Z 2023-04-26T18:31:01Z CONTRIBUTOR      

What happened?

I am trying to access some zarr data that are stored on Azure blob storage. I am able to access them using the Azure account name and key method.

I.e. this works fine, and I get a <xarray.Dataset> back, all is fine:

xr.open_mfdataset(file_list, engine="zarr", storage_options={'account_name':AZURE_STORAGE_ACCOUNT_NAME, 'account_key': AZURE_STORAGE_ACCOUNT_KEY})

However, if I understand well, it is not recommended to use the account name and key to just read some zarr data on Azure: this is using a "far too powerful" method to just access data, and it is better to use a dedicated SAS token for this kind of tasks (see for example the first answer in the discussion at https://github.com/Azure/azure-storage-azcopy/issues/1867 ). If I understand correctly, the zarr backend functionality is provided through the following "chaining" of backends: xarray -> zarr -> fsspec -> adlfs. This looks good, as it seems like adlfs supports using SAS: see https://github.com/fsspec/adlfs , setting credentials include sas_token. However, trying to use sas_token as the storage_option fails (this is a long debug trace, but the discussion continues under), asking me to use a connection_string. But if I understand well, it should be possible to use a SAS token per se, without anything more, at least in theory (and this is what azcopy does)? I open the issue here as I get it when using xarray, but it is possible that this is actually a pure adlfs issue; if so, let me know, and I can open an issue there.

``` In [26]: xr.open_mfdataset(file_list, engine="zarr", storage_options={'sas_token': AZURE_STORAGE_SAS})


ValueError Traceback (most recent call last) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:447, in AzureBlobFileSystem.do_connect(self) 446 else: --> 447 raise ValueError( 448 "Must provide either a connection_string or account_name with credentials!!" 449 ) 451 except RuntimeError:

ValueError: Must provide either a connection_string or account_name with credentials!!

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[26], line 1 ----> 1 xr.open_mfdataset([filename], engine="zarr", storage_options={'sas_token': AZURE_STORAGE_SAS})

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:982, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, kwargs) 979 open_ = open_dataset 980 getattr_ = getattr --> 982 datasets = [open_(p, open_kwargs) for p in paths] 983 closers = [getattr_(ds, "_close") for ds in datasets] 984 if preprocess is not None:

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:982, in <listcomp>(.0) 979 open_ = open_dataset 980 getattr_ = getattr --> 982 datasets = [open_(p, **open_kwargs) for p in paths] 983 closers = [getattr_(ds, "_close") for ds in datasets] 984 if preprocess is not None:

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:525, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, kwargs) 513 decoders = _resolve_decoders_kwargs( 514 decode_cf, 515 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 521 decode_coords=decode_coords, 522 ) 524 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 525 backend_ds = backend.open_dataset( 526 filename_or_obj, 527 drop_variables=drop_variables, 528 decoders, 529 kwargs, 530 ) 531 ds = _dataset_from_backend_dataset( 532 backend_ds, 533 filename_or_obj, (...) 541 kwargs, 542 ) 543 return ds

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/zarr.py:908, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version) 887 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 888 self, 889 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 905 zarr_version=None, 906 ) -> Dataset: 907 filename_or_obj = _normalize_path(filename_or_obj) --> 908 store = ZarrStore.open_group( 909 filename_or_obj, 910 group=group, 911 mode=mode, 912 synchronizer=synchronizer, 913 consolidated=consolidated, 914 consolidate_on_close=False, 915 chunk_store=chunk_store, 916 storage_options=storage_options, 917 stacklevel=stacklevel + 1, 918 zarr_version=zarr_version, 919 ) 921 store_entrypoint = StoreBackendEntrypoint() 922 with close_on_error(store):

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/zarr.py:419, in ZarrStore.open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel, zarr_version) 417 if consolidated is None: 418 try: --> 419 zarr_group = zarr.open_consolidated(store, **open_kwargs) 420 except KeyError: 421 try:

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/convenience.py:1282, in open_consolidated(store, metadata_key, mode, **kwargs) 1280 # normalize parameters 1281 zarr_version = kwargs.get('zarr_version') -> 1282 store = normalize_store_arg(store, storage_options=kwargs.get("storage_options"), mode=mode, 1283 zarr_version=zarr_version) 1284 if mode not in {'r', 'r+'}: 1285 raise ValueError("invalid mode, expected either 'r' or 'r+'; found {!r}" 1286 .format(mode))

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:181, in normalize_store_arg(store, storage_options, mode, zarr_version) 179 else: 180 raise ValueError("zarr_version must be either 2 or 3") --> 181 return normalize_store(store, storage_options, mode)

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:154, in _normalize_store_arg_v2(store, storage_options, mode) 152 if isinstance(store, str): 153 if "://" in store or "::" in store: --> 154 return FSStore(store, mode=mode, **(storage_options or {})) 155 elif storage_options: 156 raise ValueError("storage_options passed with non-fsspec path")

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:1345, in FSStore.init(self, url, normalize_keys, key_separator, mode, exceptions, dimension_separator, fs, check, create, missing_exceptions, storage_options) 1343 if protocol in (None, "file") and not storage_options.get("auto_mkdir"): 1344 storage_options["auto_mkdir"] = True -> 1345 self.map = fsspec.get_mapper(url, {mapper_options, storage_options}) 1346 self.fs = self.map.fs # for direct operations 1347 self.path = self.fs._strip_protocol(url)

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/mapping.py:237, in get_mapper(url, check, create, missing_exceptions, alternate_root, kwargs) 206 """Create key-value interface for given URL and options 207 208 The URL will be of the form "protocol://location" and point to the root (...) 234 FSMap instance, the dict-like key-value store. 235 """ 236 # Removing protocol here - could defer to each open() on the backend --> 237 fs, urlpath = url_to_fs(url, kwargs) 238 root = alternate_root if alternate_root is not None else urlpath 239 return FSMap(root, fs, check, create, missing_exceptions=missing_exceptions)

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/core.py:375, in url_to_fs(url, kwargs) 373 inkwargs["fo"] = urls 374 urlpath, protocol, _ = chain[0] --> 375 fs = filesystem(protocol, inkwargs) 376 return fs, urlpath

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/registry.py:257, in filesystem(protocol, storage_options) 250 warnings.warn( 251 "The 'arrow_hdfs' protocol has been deprecated and will be " 252 "removed in the future. Specify it as 'hdfs'.", 253 DeprecationWarning, 254 ) 256 cls = get_filesystem_class(protocol) --> 257 return cls(storage_options)

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/spec.py:76, in Cached.__call__(cls, args, kwargs) 74 return cls._cache[token] 75 else: ---> 76 obj = super().call(args, **kwargs) 77 # Setting _fs_token here causes some static linters to complain. 78 obj._fs_token = token

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:281, in AzureBlobFileSystem.init(self, account_name, account_key, connection_string, credential, sas_token, request_session, socket_timeout, blocksize, client_id, client_secret, tenant_id, anon, location_mode, loop, asynchronous, default_fill_cache, default_cache_type, version_aware, kwargs) 269 if ( 270 self.credential is None 271 and self.anon is False 272 and self.sas_token is None 273 and self.account_key is None 274 ): 276 ( 277 self.credential, 278 self.sync_credential, 279 ) = self._get_default_azure_credential(kwargs) --> 281 self.do_connect() 282 weakref.finalize(self, sync, self.loop, close_service_client, self) 284 if self.credential is not None:

File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:457, in AzureBlobFileSystem.do_connect(self) 454 self.do_connect() 456 except Exception as e: --> 457 raise ValueError(f"unable to connect to account for {e}")

ValueError: unable to connect to account for Must provide either a connection_string or account_name with credentials!! ```

What did you expect to happen?

I would expect to be able to access the zarr dataset on Azure using the SAS token alone, as I can do in for example azcopy (I have tested and I can access this exact dataset using this exact SAS token using azcopy)

Minimal Complete Verifiable Example

I cannot share the access tokens / account name and key unfortunately as these are secret, so this makes it hard to create a MCVE.

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

``` In [27]: xr.show_versions()

INSTALLED VERSIONS

commit: None python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-69-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2

xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2022.12.1 distributed: 2022.12.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: None numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.2 pip: 23.1.1 conda: None pytest: None mypy: None IPython: 8.12.0 sphinx: None ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1486990254 I_kwDOAMm_X85Yoauu 7372 getting confused by xarray not reporting the netCDF ```_FillValue``` attribute in jupyter notebooks: add ```_FillValue``` to the "report"? jerabaul29 8382834 closed 0     3 2022-12-09T15:56:36Z 2023-03-13T19:56:39Z 2023-03-13T19:56:39Z CONTRIBUTOR      

What is your issue?

I love to use xarray to look at / explore / work with netCDF files in notebooks. There is one point that has confused me quite a bit though: the fact that the "report" provided by xarray does not seem to include the _FillValue attribute. For me it is a quite important attribute to know of (I have seen some issues in some files due to poorly chosen _FillValue for example). It looks like xarray is handling _FillValue just fine, but I wonder if being just a bit more explicit rather than implicit, ie showing _FillValue in the report, could be useful? At least this would lift some confusions I have myself regularly ^^ :) .

To illustrate what I say, a screenshot (this is on VSC, a notebook is available at https://github.com/jerabaul29/illustrations/blob/main/inconsistencies/xarray/xarray_fillvalue/xr_fillvalue_issue_base.ipynb , though at the moment the "xarray report" part does not seem to be fully rendered by the github online view: what I mean by "the report" is the html interactive output after the 6 at the bottom):

the "normal" attribute is shown (some_attribute) but the _FilledValue attribute is not shown. I thought / was worried it was dropped when re-dumping to netCDF, though the following of the notebook shows that xarray handles it just fine and re-dumps it when writing again a new netCDF file; still, I often get confused because of this attribute not being reported.

Would it be possible to make the _FillValue attribute visible in the html report to avoid people like me getting confused? :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7372/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1552701403 I_kwDOAMm_X85cjFfb 7468 Provide default APIs and functions for getting variable at a given location, based on some criteria / extrema conditions on other variables jerabaul29 8382834 open 0     0 2023-01-23T08:35:43Z 2023-01-23T08:35:43Z   CONTRIBUTOR      

Is your feature request related to a problem?

No, this is related to a need that comes regularly when working with netCDF files in geosciences.

Describe the solution you'd like

what is needed

There are many cases with netcdf files when one wants to find some location, or get variable(s) at some location, where the location is determined by a condition on some variables. A classical example, around which there are many stack overflow questions, online discussions, suggested "hacky" solution, snippets etc, available, is something like the following. Given a file that looks like this:

dimensions: nj = 949 ; ni = 739 ; nc = 5 ; time = UNLIMITED ; // (24 currently) variables: float TLAT(nj, ni) ; TLAT:long_name = "T grid center latitude" ; TLAT:units = "degrees_north" ; TLAT:missing_value = 1.e+30f ; TLAT:_FillValue = 1.e+30f ; float TLON(nj, ni) ; TLON:long_name = "T grid center longitude" ; TLON:units = "degrees_east" ; TLON:missing_value = 1.e+30f ; TLON:_FillValue = 1.e+30f ; float Tair_h(time, nj, ni) ; Tair_h:units = "C" ; Tair_h:long_name = "air temperature" ; Tair_h:coordinates = "TLON TLAT time" ; Tair_h:cell_measures = "area: tarea" ; Tair_h:missing_value = 1.e+30f ; Tair_h:_FillValue = 1.e+30f ; Tair_h:time_rep = "instantaneous" ;

answer a question like:

  • find the mesh point (ni, nj) closest to the location (TLAT=latval, TLON=lonval)?
  • give the nearest / interpolated value of Tair_h at latitude and longitude (latval, lonval)
  • do the same as above for lists / arrays of coordinates.

I do not think there is a recommended, standard, simple / one liner to do this with xarray in general (in particular if the (latval, lonval) falls out of the discrete set of mesh nodes). This means that a there are plenty of ad hoc hacked solutions getting shared around to solve this. Having a default recommended way would likely help users quite a bit and save quite some work.

the existing ways to solve the need

As soon as the TLAT and TLON are not "aligned" with the ni and nj coordinates (if they exactly match a mesh point, then likely some .where(TLAT=latval, TLON=lonval) can do), this is a bit of work. One has typically to:

  • build the 2D (dependent on (ni, nj) ) field representing the function (ni, nj) -> distance(node(ni, nj), point(latval, lonval) )
  • find the smallest value on this field to get the nearest coordinate and the value there, or the few smallest values and use some interpolation to interpolate

There are many more examples of questions that revolve around this kind of "query", and the answers are usually ad-hoc, though a lot of the logics repeat themselves, which make me believe a general high quality / standard solution would be useful:

  • https://stackoverflow.com/questions/58758480/xarray-select-nearest-lat-lon-with-multi-dimension-coordinates
  • https://gis.stackexchange.com/questions/357026/indexing-coordinates-in-order-to-get-a-value-by-coordinates-from-xarray (but what in the case where the point looked for "falls between" mesh nodes?)

Also note that most of these answers use simple / relatively naive / inefficient algorithms, but I wonder if there are some examples of code that could be used to build this in an efficient way, see the discussions in:

  • https://github.com/xarray-contrib/xoak
  • https://stackoverflow.com/questions/10818546/finding-index-of-nearest-point-in-numpy-arrays-of-x-and-y-coordinates
  • https://stackoverflow.com/questions/2566412/find-nearest-value-in-numpy-array

It looks like there are some snippets available that can be use to do this more or less exactly, when the netcdf file follows some conventions:

  • https://gist.github.com/blaylockbk/0ac5427b09fbae8d367a691ff90cdb4e

It looks like there is no dedicated / recommended / default xarray solution to do this though. It would be great if xarray could offer a (set of) well tested, well implemented, efficient way(s) to solve this kind of needs. I guess this is such a common need that providing a default solution with a default API, even if it is not optimal for all use cases, would be better than providing nothing at all and have users hack their own helper functions.

what xarray could implement

It would be great if xarray could offer support for this built in. A few thoughts of how this could be done:

  • calculate function on all array based on specification
  • find closest / interpolation way
  • provide a few default "assemblies" of these functions to support common file kinds
  • provide some ways to check that the request is reasonable / well formulated (for example, some functions in the kind of check_that_convex, that would check that taking a minimum is more or less reasonable).

I wonder if thinking about a few APIs and agreeing on these would be helpful before implementing anything. Just for the sake of brainstorming, maybe some functions with this kind of "API pseudocode" on datasets could make sense / would be a nice standardization to offer to users? Any thoughts / ideas of better solution?

python def compute_function_on_mesh(self, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> numpy_2d_array: """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node."""

python def find_node_with_lowest_value(self, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> Tuple(dim1, ..., dimn): """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node, and return the node coordinates that minimize the function."""

python def get_variable_at_node_with_lowest_value(self, variable_to_use, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> float: """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node, and return the variable_to_use value at the node coordinates that minimize the function."""

(note: for this last function, consider also providing a variant that performs interpolation outside of mesh points?)

Maybe providing a few specializations for working with finding specific points in space would be useful? Like:

python def get_variable_at_closest_location(self, variable_to_use, variable_lat, variable_lon, latvalue, lonvalue) -> float: """get variable_to_use at the mesh point closest to (latvalue, lonvalue), using the variables variable_lat, variable_lon as the lat and lon axis."""

Describe alternatives you've considered

Writing my own small function, or re-using some snippet circulating on internet.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7468/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1520760951 I_kwDOAMm_X85apPh3 7421 Opening a variable as several chunks works fine, but opening it "fully" crashes jerabaul29 8382834 closed 0     8 2023-01-05T13:36:42Z 2023-01-06T14:13:55Z 2023-01-06T14:13:55Z CONTRIBUTOR      

What happened?

I have set up a ipynb notebook that may be a clear explanation:

https://github.com/jerabaul29/misc/blob/main/BugReports/OpeningIssueXarray/issue_opening_2018_03_b.ipynb

Short report:

  • I have a netcdf file that is not specially big (around 1.6GB)
  • When I try to fully open one of its variables, it crashes, ie this crashes:

```

for some reason, this does not work; crash

xr_file["accD"][0, 0:3235893].data ```

but opening in several passes works, i.e. this works:

xr_file["accD"][0, 1000000:2000000].data xr_file["accD"][0, 2000000:3000000].data xr_file["accD"][0, 3000000:3235893].data

Any idea why?

What did you expect to happen?

Opening the variable in full should work.

Minimal Complete Verifiable Example

https://github.com/jerabaul29/misc/blob/main/BugReports/OpeningIssueXarray/issue_opening_2018_03_b.ipynb

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

The netcdf file uses unlimited dimensions, can this be a problem? I am not the author of the file, but I think it was generated from some Matlab netCDF package.

Environment

Ubuntu 20.04, python 3.8.10, xarray 0.20.2.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7421/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1479121713 I_kwDOAMm_X85YKZsx 7363 expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) jerabaul29 8382834 closed 0     29 2022-12-06T13:33:12Z 2022-12-07T13:15:38Z 2022-12-07T13:07:10Z CONTRIBUTOR      

Is your feature request related to a problem?

I regularly need to extend some netCDF files along the time axis (to add some newer data segments for example), and I do not know of a "good" method for this in xarray. For examle, SO recommends the following which is a bit heavy: https://stackoverflow.com/questions/34987972/expand-dimensions-xarray . The function with the natural name for this task, https://docs.xarray.dev/en/stable/generated/xarray.DataArray.expand_dims.html , adds an axis rather than extend an existing axis.

I have done a bit of looking around, I hope I do not miss any resources.

Describe the solution you'd like

I would like to be able to, given a dataset in xarray (in my case the underlying data are netCDF, but I guess this makes no difference):

  • issue a simple command / run a single function, that would
  • add / extend by a number of entries on a dimension, making the dimension grow "by its end" (I know there is not really a dimension array, but you see what I mean: grow the dimension by making it longer, from its end)
  • add the corresponding number of entries in the right way on all data variables that use this dimension, making the data variables grow "by their end"
  • fill the new entries of the data variables that have just been extended with a default value (would be great if this could be user specifiable)

Describe alternatives you've considered

As of today, I think (if I am wrong please let me know / suggest better :) ) that the best solution is to simply:

  • extract the data from the existing xarray dataset into numpy arrays
  • extend the data variables as numpy arrays by hand
  • generate a new xarray dataset from the extended numpy arrays and copy the metadata from the initial xarray

I do not know if an "effective, xarray native" solution would do exactly this in a small python wrapper, or if there is a more effective / natural way to do this in "xarray native" code.

Additional context

I have a netCDF-CF file with a time dimension that I need to regularly grow as more data become available. For internal reasons I want to keep a single netCDF file and make it grow, rather than end up with a multitude of small segments and multi file netCDF dataset.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7363/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 164.483ms · About: xarray-datasette