home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 2216068694

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2216068694 I_kwDOAMm_X86EFoZW 8895 droping variables when accessing remote datasets via pydap 8241481 open 0     1 2024-03-29T22:55:45Z 2024-05-03T15:15:09Z   CONTRIBUTOR      

Is your feature request related to a problem?

I ran into the following issue when trying to access a remote dataset. Here is the concrete example that reproduces the error. ```python from pydap.client import open_url from pydap.cas.urs import setup_session import xarray as xr import numpy as np

username = "UsernameHere" password= "PasswordHere" filename = 'Daymet_Daily_V4R1.daymet_v4_daily_na_tmax_2010.nc' hyrax_url = 'https://opendap.earthdata.nasa.gov/collections/C2532426483-ORNL_CLOUD/granules/' url1 = hyrax_url + filename session = setup_session(username, password, check_url=hyrax_url)

ds = xr.open_dataset(url1, engine="pydap", session=session) The last line returns an error:python ValueError: dimensions ('time',) must have the same length as the number of data dimensions, ndim=2 ```

The issue involves the variable time_bnds. I know that because this works: python DS = [] for var in [var for var in tmax_ds.keys() if var not in ['time_bnds']]: DS.append(xr.open_dataset(url1+'?'+var, engine='pydap', session=session)) ds = xr.merge(DS) I also tried passing decode_times=False but continue having the error. The above for loop works but I think unnecessarily too slow (~30 secs).

I tried all this with the newer versions of xarray.__version__ = [2024.2, 2024.3].

Describe the solution you'd like

I think it would be nice to be able to drop the variable I know I don't want. So something like this:

python ds = xr.open_dataset(url1, drop_variables='time_bnds', engine="pydap", session=session) and only create a xarray.dataset with the variables I want. However when I do that <ins>I continue to have the same error as before</ins>, which means that drop_variables is being applied after creating the xarray.dataset.

Describe alternatives you've considered

This is potentially a backend issue with pydap - which does not take a drop_variables option, but since dropping a variable is a one-liner in pydap and takes less than 1milisec, it makes it an desirable feature.

For example I can easily open the dataset and drop the variable with pydap as described below

```python $ dataset = open_url(url1, session=session) # this works $ dataset[tuple([var for var in dataset.keys() if var not in ['time_bnds']])] # this takes < 1ms

<DatasetType with children 'y', 'lon', 'lat', 'time', 'x', 'tmax', 'lambert_conformal_conic', 'yearday'> ```

It looks like it would be a easy implementation on the backend, but at the same time I took a look at pydap_.py

https://github.com/pydata/xarray/blob/b80260781ee19bddee01ef09ac0da31ec12c5152/xarray/backends/pydap_.py#L129-L130

and I feel like it could also be implemented at the xarray level by allowing drop_variables which is already an argument in xarray.open_dataset, to be passed to the PydapDataStore (I guess in both scenarios drop_variables would be passed).

Any thoughts or suggestions? I can certainly lead on this effort as I already will be working on enabling the dap4 implementation within pydap.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8895/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 1.152ms · About: xarray-datasette