home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "CONTRIBUTOR" and issue = 288184220 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • jbusecke 4
  • j08lue 1
  • Hossein-Madadi 1

issue 1

  • We need a fast path for open_mfdataset · 6 ✖

author_association 1

  • CONTRIBUTOR · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
768600657 https://github.com/pydata/xarray/issues/1823#issuecomment-768600657 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDc2ODYwMDY1Nw== Hossein-Madadi 9200184 2021-01-27T21:51:24Z 2021-01-27T21:52:11Z CONTRIBUTOR

PS @rabernat

%%time ds = xr.open_mfdataset("/glade/p/cesm/community/ASD-HIGH-RES-CESM1/hybrid_v5_rel04_BC5_ne120_t12_pop62/ocn/proc/tseries/monthly/*.nc", parallel=True, coords="minimal", data_vars="minimal", compat='override')

This completes in 40 seconds with 10 workers on cheyenne.

@dcherian, thanks for your solution. In my experience with 34013 NetCDF files, I could open 117 Gib in 13min 14s. Can I decrease this time?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
531945252 https://github.com/pydata/xarray/issues/1823#issuecomment-531945252 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDUzMTk0NTI1Mg== jbusecke 14314623 2019-09-16T20:29:35Z 2019-09-16T20:29:35Z CONTRIBUTOR

Wooooow. Thanks. Ill have to give this a whirl soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
489064553 https://github.com/pydata/xarray/issues/1823#issuecomment-489064553 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDQ4OTA2NDU1Mw== j08lue 3404817 2019-05-03T11:26:06Z 2019-05-03T11:36:44Z CONTRIBUTOR

The original issue of this thread is that you sometimes might want to disable alignment checks for coordinates other than the concat_dim and only check for same dimensions and dimension shapes.

When you xr.merge with join='exact', it still checks for alignment (see https://github.com/pydata/xarray/pull/1330#issuecomment-302711852), but does not join the coordinates if they are not aligned. This behavior (not joining) is also included in what @rabernat envisioned here, but his suggestion goes beyond that: you don't even load coordinate values from all but the first dataset and just blindly trust that they are aligned.

So xr.open_mfdataset(join='exact', coords='minimal') does not fix this issue here, I think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
373123959 https://github.com/pydata/xarray/issues/1823#issuecomment-373123959 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDM3MzEyMzk1OQ== jbusecke 14314623 2018-03-14T18:16:38Z 2018-03-14T18:16:38Z CONTRIBUTOR

Awesome, thanks for the clarification. I just looked at #1981 and it seems indeed very elegant (in fact I just now used this approach to parallelize printing of movie frames!) Thanks for that!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
372856076 https://github.com/pydata/xarray/issues/1823#issuecomment-372856076 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDM3Mjg1NjA3Ng== jbusecke 14314623 2018-03-13T23:40:54Z 2018-03-13T23:40:54Z CONTRIBUTOR

Would these two options be necessarily mutually exclusive?

I think parallelizing the read in sounds amazing.

But isnt there some merit in skipping some of the checks all together, if the user is sure about the structure of the data contained in the many files?

I am often working with the aforementioned type of data (many files either contain a new timestep or a different variable, but most of the dimensions/coordinates are the same).

In some cases I am finding that reading the data "lazily" consumes a significant amount of the time in my workflow. I am unsure how hard this would be to achieve, and perhaps it is not worth it after all.

Just putting out a few ideas, while I wait for my xr.open_mfdataset to finish :-)

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
359069753 https://github.com/pydata/xarray/issues/1823#issuecomment-359069753 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDM1OTA2OTc1Mw== jbusecke 14314623 2018-01-19T19:45:00Z 2018-01-19T19:45:00Z CONTRIBUTOR

I did not really find an elegant solution. What I did was just specify all dims and coords as drop_variables and then update those from a master file with ds.update(ds_master) Perhaps this could be generalized in a sense, by reading all coords and dims just from the first file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.517ms · About: xarray-datasette