home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 212646769 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • jonasghini 2
  • nordam 1
  • shoyer 1
  • stale[bot] 1

author_association 2

  • NONE 4
  • MEMBER 1

issue 1

  • Opening multiple OpenDAP paths with open_mfdataset fails · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
461787608 https://github.com/pydata/xarray/issues/1302#issuecomment-461787608 https://api.github.com/repos/pydata/xarray/issues/1302 MDEyOklzc3VlQ29tbWVudDQ2MTc4NzYwOA== stale[bot] 26384082 2019-02-08T12:27:59Z 2019-02-08T12:27:59Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening multiple OpenDAP paths with open_mfdataset fails 212646769
285651253 https://github.com/pydata/xarray/issues/1302#issuecomment-285651253 https://api.github.com/repos/pydata/xarray/issues/1302 MDEyOklzc3VlQ29tbWVudDI4NTY1MTI1Mw== nordam 319297 2017-03-10T11:56:12Z 2017-03-10T11:56:12Z NONE

Regarding the Request too big error, I believe this comes from the server. The paths in the examples above are publicly available files from the Norwegian Meteorolgical institute, and I believe they cap downloading via OpenDAP at 500 MB per request (though you can download the NetCDF files via http, and these are all larger than 500 MB, see http://thredds.met.no/thredds/catalog/fou-hi/norkyst800m-1h/catalog.html ).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening multiple OpenDAP paths with open_mfdataset fails 212646769
285605971 https://github.com/pydata/xarray/issues/1302#issuecomment-285605971 https://api.github.com/repos/pydata/xarray/issues/1302 MDEyOklzc3VlQ29tbWVudDI4NTYwNTk3MQ== jonasghini 6470530 2017-03-10T08:17:08Z 2017-03-10T08:17:08Z NONE

Here I've got some more info:

1) Directly pertaining to your question, shoyer, with, python d = xr.open_mfdataset([ 'http://thredds.met.no/thredds/dodsC/fou-hi/norkyst800m-1h/NorKyst-800m_ZDEPTHS_his.an.2017022000.nc', ]) d.load() I get the same CURL Error: Failed initialization

However, skipping .load() solves that particular problem when having only a single file in xr.open_mfdataset(), as this works fine: python d = xr.open_mfdataset([ 'http://thredds.met.no/thredds/dodsC/fou-hi/norkyst800m-1h/NorKyst-800m_ZDEPTHS_his.an.2017022000.nc', ]) fu = RectBivariateSpline(d.X[0:].values, d.Y[675:].values, d.u[0, 0, 675:, :].values.T) If I skip the .values I get a Request too big error, but I do not know if it is SciPy or xarray that raises that error.

2) If I download several of the thredds*.nc files to my local machine, and then use xr.open_mfdataset('DataSets/*.nc') it all works beautifully: I do not get CURL errors, nor do I need to use .values in my interpolator call.

3) I tried following the call to .load() -> dask.array.compute(*lazy_data.values()) -> results = get(dsk, keys, **kwargs). At this point it spits out CURL errors. Dask != xarray, so perhaps the error is not with the xarray package at all.

4) When using open_mfdataset() to open a single local file, and then .load()-ing it, I get a memory error from dask, even if, by number, the file I am trying to load should fit in my computer's memory. This is ''an issue with load()'', but may not be related at all to the remote access problem we are trying to figure out.

5) I asked the people hosting the files if perhaps there were some access restrictions that might be the issue here, but they were not aware of anything.

What conclusions to draw from this I don't know. I'll make another pass later and see if perhaps I can identify exactly what is different when loading remote VS local files, in case that could tell us something. At least it is clear that when calling the .load() directly, my computer does play nice, although it seems to break for different reasons when loading remote VS local files. Also, referring back to my previous comment, it does not seem that the single VS several file openings break at the same point either, given that the equals() function does not call .load(), although perhaps it does something similar in order to do the check.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening multiple OpenDAP paths with open_mfdataset fails 212646769
285492492 https://github.com/pydata/xarray/issues/1302#issuecomment-285492492 https://api.github.com/repos/pydata/xarray/issues/1302 MDEyOklzc3VlQ29tbWVudDI4NTQ5MjQ5Mg== shoyer 1217238 2017-03-09T21:41:50Z 2017-03-09T21:41:50Z MEMBER

Does it work to load one of these single datasets completely into memory, e.g., by calling .load()?

There might be a general issue with loading the "lat" variable that only appears with open_mfdataset because it does the check explicitly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening multiple OpenDAP paths with open_mfdataset fails 212646769
285282066 https://github.com/pydata/xarray/issues/1302#issuecomment-285282066 https://api.github.com/repos/pydata/xarray/issues/1302 MDEyOklzc3VlQ29tbWVudDI4NTI4MjA2Ng== jonasghini 6470530 2017-03-09T08:04:06Z 2017-03-09T08:04:06Z NONE

Full disclosure, I am working with the same project as OP, but I am using a Linux machine (Ubuntu 16.04 LTS), having gotten the same package versions as OP, with Python 3.5.1 from Anaconda.

The problem is reproducible also on my machine, and I've done some more digging:

Tracing the different functions invoked from the first call to open_mfdataset() goes like this: open_mfdataset() -> auto_combine() -> _auto_concat() -> concat() -> _dataset_concat() -> _calc_concat_over() -> differs() -> equals()

Now, within the equals(), xarray tests for equality, and is """True if two DataArrays have the same dimensions, coordinates and values; otherwise False."""

With the example files OP tests for, all calls to equals() works fine until it reaches the variable "lat" (long name: latitude) (note that I am not sufficiently versed in these kinds of datasets to tell if "lat" is a variable or a coordinate, but at least it is treated as different than, say, "X", "Y" and "time".). Once the equals() function is invoked to tell if dataset[1:] is equal to dataset[0], it starts spitting out the CURL error.

This information does not illuminate the problem for me, but perhaps for any of you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening multiple OpenDAP paths with open_mfdataset fails 212646769

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.58ms · About: xarray-datasette