home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER", issue = 288184220 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: updated_at (date)

user 1

  • dcherian · 7 ✖

issue 1

  • We need a fast path for open_mfdataset · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
768627652 https://github.com/pydata/xarray/issues/1823#issuecomment-768627652 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDc2ODYyNzY1Mg== dcherian 2448579 2021-01-27T22:43:59Z 2021-01-27T22:43:59Z MEMBER

That's 34k 3MB files! I suggest combining to 1k 100MB files, that would work a lot better.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
768460310 https://github.com/pydata/xarray/issues/1823#issuecomment-768460310 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDc2ODQ2MDMxMA== dcherian 2448579 2021-01-27T17:50:09Z 2021-01-27T17:50:09Z MEMBER

Let's close this since there is an opt-in mostly-fast path. I've added an item to #4648 to cover adding an asv benchmark for mfdataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
531913598 https://github.com/pydata/xarray/issues/1823#issuecomment-531913598 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDUzMTkxMzU5OA== dcherian 2448579 2019-09-16T19:03:47Z 2019-09-16T19:03:47Z MEMBER

PS @rabernat

%%time ds = xr.open_mfdataset("/glade/p/cesm/community/ASD-HIGH-RES-CESM1/hybrid_v5_rel04_BC5_ne120_t12_pop62/ocn/proc/tseries/monthly/*.nc", parallel=True, coords="minimal", data_vars="minimal", compat='override') This completes in 40 seconds with 10 workers on cheyenne.

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
531912893 https://github.com/pydata/xarray/issues/1823#issuecomment-531912893 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDUzMTkxMjg5Mw== dcherian 2448579 2019-09-16T19:01:57Z 2019-09-16T19:01:57Z MEMBER

=) @TomNicholas PRs welcome!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
531816800 https://github.com/pydata/xarray/issues/1823#issuecomment-531816800 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDUzMTgxNjgwMA== dcherian 2448579 2019-09-16T15:00:16Z 2019-09-16T15:00:16Z MEMBER

YES! (well almost)

The PR lets you skip compatibility checks. The magic spell is xr.open_mfdataset(..., data_vars="minimal", coords="minimal", compat="override") You can skip index comparison by adding join="override".

Whats left is extremely large indexes and lazy index / coordinate loading but we have #2039 open for that. I will rename that issue.

If you have time, can you test it out?

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
489135792 https://github.com/pydata/xarray/issues/1823#issuecomment-489135792 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDQ4OTEzNTc5Mg== dcherian 2448579 2019-05-03T15:29:14Z 2019-05-03T15:40:27Z MEMBER

One common use-case is files with large numbers of concat_dim-invariant non-dimensional co-ordinates. This is easy to speed up by dropping those variables from all but the first file.

e.g. https://github.com/pangeo-data/esgf2xarray/blob/6a5e4df0d329c2f23b403cbfbb65f0f1dfa98d52/esgf2zarr/aggregate.py#L107-L110 python # keep only coordinates from first ensemble member to simplify merge first = member_dsets_aligned[0] rest = [mds.reset_coords(drop=True) for mds in member_dsets_aligned[1:]] objs_to_concat = [first] + rest

Similarly https://github.com/NCAR/intake-esm/blob/e86a8e8a80ce0fd4198665dbef3ba46af264b5ea/intake_esm/aggregate.py#L53-L57

python def merge_vars_two_datasets(ds1, ds2): """ Merge two datasets, dropping all variables from second dataset that already exist in the first dataset's coordinates. """

See also #2039 (second code block)

One way to do this might be to add a master_file kwarg to open_mfdataset. This would imply coords='minimal', join='exact' (I think; prealigned=True in some other proposals) and would drop non-dimensional coordinates from all but the first file and then call concat.

As bonus it would assign attributes from the master_file to the merged dataset (for which I think there are open issues) : this functionality exists in netCDF4.MFDataset so that's a plus.

EDIT: #2039 (third code block) is also a possibility. This might look like python xr.open_mfdataset('files*.nc', master_file='first', concat_dim='time') in which case the first file is read; all coords that are not concat_dim become drop_variables for an open_dataset call that reads the remaining files. We then merge with the first dataset and assign attrs.

EDIT2: master_file combines two different functionalities here: specifying a "template file" and a file to choose attributes from. So maybe we need two kwargs: template_file and attrs_from?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220
488440840 https://github.com/pydata/xarray/issues/1823#issuecomment-488440840 https://api.github.com/repos/pydata/xarray/issues/1823 MDEyOklzc3VlQ29tbWVudDQ4ODQ0MDg0MA== dcherian 2448579 2019-05-01T21:42:01Z 2019-05-01T21:45:38Z MEMBER

I am currently motivated to fix this.

  1. Over in https://github.com/pydata/xarray/pull/1413#issuecomment-302843502 @rabernat mentioned

    allowing the user to pass join='exact' via open_mfdataset. A related optimization would be to allow the user to pass coords='minimal' (or other concat coords options) via open_mfdataset.

  2. @shoyer suggested calling decode_cf later here though perhaps this wont help too much: https://github.com/pydata/xarray/issues/1385#issuecomment-439263419

Is this all that we can do on the xarray side?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  We need a fast path for open_mfdataset 288184220

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2047.231ms · About: xarray-datasette