home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open" and user = 731499 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1288323549 I_kwDOAMm_X85MykHd 6736 better handling of invalid files in open_mfdataset vnoel 731499 open 0     2 2022-06-29T08:00:18Z 2023-07-09T23:49:36Z   CONTRIBUTOR      

Is your feature request related to a problem?

Suppose I'm trying to read a large number of netCDF files with open_mfdataset.

Now suppose that one of those files is for some reason incorrect -- for instance there was a problem during the creation of that particular file, and its file size is zero, or it is not valid netCDF. The file exists, but it is invalid.

Currently open_mfdataset will raise an exception with the message ValueError: did not find a match in any of xarray's currently installed IO backends

As far as I can tell, there is currently no way to identify which one(s) of the files being read is the source of the problem. If there are several hundreds of those, finding the problematic files is a task by itself, even though xarray probably knows them.

Describe the solution you'd like

It would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception.

Apart from better reporting, I would find it very useful if I could pass to open_mfdataset some kind of argument that would make it ignore invalid files altogether (ignore_invalid=False comes to mind).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6736/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
205414496 MDU6SXNzdWUyMDU0MTQ0OTY= 1249 confusing dataset creation process vnoel 731499 open 0     6 2017-02-05T09:52:44Z 2022-06-26T15:07:59Z   CONTRIBUTOR      

In another issue I create a simple dataset like so:

python lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon})

I expected d to contain two variables (latitude and longitude) with no coordinates. Instead d appears to contain two coordinates and no variables:

In [5]: d Out[5]: <xarray.Dataset> Dimensions: (latitude: 50000, longitude: 50000) Coordinates: * latitude (latitude) float64 -76.0 -84.36 26.69 66.44 -37.85 50.13 ... * longitude (longitude) float64 -148.7 -74.82 18.37 117.7 80.63 12.25 ... Data variables: *empty*

Is this desired behavior?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1249/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2961.853ms · About: xarray-datasette