issue_comments
1 row where author_association = "MEMBER", issue = 397063221 and user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- open_mfdataset in v.0.11.1 is very slow · 1 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
454351420 | https://github.com/pydata/xarray/issues/2662#issuecomment-454351420 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDM1MTQyMA== | shoyer 1217238 | 2019-01-15T10:56:03Z | 2019-01-15T10:56:03Z | MEMBER | @malmans2 thanks for this reproducible test case! From xarray's perspective, the difference is the order in which the arrays are concatenated/processed. This is determined by sorting the (globbed) file names: ``` In [16]: sorted(glob.glob('rep/.nc')) Out[16]: ['rep0/dsA0.nc', 'rep0/dsB0.nc', 'rep1/dsA1.nc', 'rep1/dsB1.nc'] In [17]: sorted(glob.glob('*.nc')) Out[17]: ['dsA0.nc', 'dsA1.nc', 'dsB0.nc', 'dsB1.nc'] ``` It appears that the slow case [A0, B0, A1, B1] now requires computing data with dask, whereas [A0, A1, B0, B1] does not. I suspect the issue is that we're now using some different combination of We could (and should) optimize this path in merge to avoid eagerly loading data, but the immediate fix here is probably to make sure we're using concat instead of merge. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1