home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open" and user = 8241481 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2216068694 I_kwDOAMm_X86EFoZW 8895 droping variables when accessing remote datasets via pydap Mikejmnez 8241481 open 0     1 2024-03-29T22:55:45Z 2024-05-03T15:15:09Z   CONTRIBUTOR      

Is your feature request related to a problem?

I ran into the following issue when trying to access a remote dataset. Here is the concrete example that reproduces the error. ```python from pydap.client import open_url from pydap.cas.urs import setup_session import xarray as xr import numpy as np

username = "UsernameHere" password= "PasswordHere" filename = 'Daymet_Daily_V4R1.daymet_v4_daily_na_tmax_2010.nc' hyrax_url = 'https://opendap.earthdata.nasa.gov/collections/C2532426483-ORNL_CLOUD/granules/' url1 = hyrax_url + filename session = setup_session(username, password, check_url=hyrax_url)

ds = xr.open_dataset(url1, engine="pydap", session=session) The last line returns an error:python ValueError: dimensions ('time',) must have the same length as the number of data dimensions, ndim=2 ```

The issue involves the variable time_bnds. I know that because this works: python DS = [] for var in [var for var in tmax_ds.keys() if var not in ['time_bnds']]: DS.append(xr.open_dataset(url1+'?'+var, engine='pydap', session=session)) ds = xr.merge(DS) I also tried passing decode_times=False but continue having the error. The above for loop works but I think unnecessarily too slow (~30 secs).

I tried all this with the newer versions of xarray.__version__ = [2024.2, 2024.3].

Describe the solution you'd like

I think it would be nice to be able to drop the variable I know I don't want. So something like this:

python ds = xr.open_dataset(url1, drop_variables='time_bnds', engine="pydap", session=session) and only create a xarray.dataset with the variables I want. However when I do that <ins>I continue to have the same error as before</ins>, which means that drop_variables is being applied after creating the xarray.dataset.

Describe alternatives you've considered

This is potentially a backend issue with pydap - which does not take a drop_variables option, but since dropping a variable is a one-liner in pydap and takes less than 1milisec, it makes it an desirable feature.

For example I can easily open the dataset and drop the variable with pydap as described below

```python $ dataset = open_url(url1, session=session) # this works $ dataset[tuple([var for var in dataset.keys() if var not in ['time_bnds']])] # this takes < 1ms

<DatasetType with children 'y', 'lon', 'lat', 'time', 'x', 'tmax', 'lambert_conformal_conic', 'yearday'> ```

It looks like it would be a easy implementation on the backend, but at the same time I took a look at pydap_.py

https://github.com/pydata/xarray/blob/b80260781ee19bddee01ef09ac0da31ec12c5152/xarray/backends/pydap_.py#L129-L130

and I feel like it could also be implemented at the xarray level by allowing drop_variables which is already an argument in xarray.open_dataset, to be passed to the PydapDataStore (I guess in both scenarios drop_variables would be passed).

Any thoughts or suggestions? I can certainly lead on this effort as I already will be working on enabling the dap4 implementation within pydap.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8895/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
605266906 MDU6SXNzdWU2MDUyNjY5MDY= 3995 open_mfzarr files + intake-xarray Mikejmnez 8241481 open 0     1 2020-04-23T06:11:41Z 2022-04-30T13:37:50Z   CONTRIBUTOR      

This is related to a previous issue (#3668), although the actual problem on that issue is a bit more technically involved and is related with clusters... I decided to open this related issue so that the discussion in #3668 remains visible for other users.

This issue is about the need to implement code that allows to read multiple zarr files. This can be particularly helpful when reading data through an Intake Catalog entry (Intake-xarray plugin), which can allow for a compact way to introduce parallelism when working with multiple zarr files. There are two steps for this, one to work at the xarray level (write a fuction that does that) and then write an option on the Intake-xarray plugin that can use such xarray functionality.

I am more than willing to work on such problem, if nobody's already working on this.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3995/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 112.196ms · About: xarray-datasette