issue_comments
6 rows where issue = 397063221 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- open_mfdataset in v.0.11.1 is very slow · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
454450672 | https://github.com/pydata/xarray/issues/2662#issuecomment-454450672 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDQ1MDY3Mg== | dcherian 2448579 | 2019-01-15T16:14:12Z | 2019-01-15T16:14:12Z | MEMBER | We have airspeedvelocity performance tests. I don't know there's one for auto_combine but maybe you can add one @TomNicholas |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 | |
454439392 | https://github.com/pydata/xarray/issues/2662#issuecomment-454439392 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDQzOTM5Mg== | malmans2 22245117 | 2019-01-15T15:45:03Z | 2019-01-15T15:45:03Z | CONTRIBUTOR | I checked PR #2678 with the data that originated the issue and it fixes the problem! |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 | |
454423937 | https://github.com/pydata/xarray/issues/2662#issuecomment-454423937 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDQyMzkzNw== | TomNicholas 35968931 | 2019-01-15T15:05:22Z | 2019-01-15T15:05:22Z | MEMBER | Yes thankyou @malmans2, this is very helpful!
This was very puzzling because the code is supposed to split the datasets up according to their data variables, which means merge won't be used to concatenate and this should be fast, as before. But I found the problem! In before
With this change then I get ```python No longer slow if netCDFs are stored in several folders:%timeit ds_2folders = xr.open_mfdataset('rep/.nc', concat_dim='T')
Without this pre-sorting, Whether or not groupby sorted properly depended on the order of datasets in the input to groupby, which eventually depended on the way they were loaded (as the example in this issue makes clear). The reason this mistake got past the unit tests is that |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 | |
454351420 | https://github.com/pydata/xarray/issues/2662#issuecomment-454351420 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDM1MTQyMA== | shoyer 1217238 | 2019-01-15T10:56:03Z | 2019-01-15T10:56:03Z | MEMBER | @malmans2 thanks for this reproducible test case! From xarray's perspective, the difference is the order in which the arrays are concatenated/processed. This is determined by sorting the (globbed) file names: ``` In [16]: sorted(glob.glob('rep/.nc')) Out[16]: ['rep0/dsA0.nc', 'rep0/dsB0.nc', 'rep1/dsA1.nc', 'rep1/dsB1.nc'] In [17]: sorted(glob.glob('*.nc')) Out[17]: ['dsA0.nc', 'dsA1.nc', 'dsB0.nc', 'dsB1.nc'] ``` It appears that the slow case [A0, B0, A1, B1] now requires computing data with dask, whereas [A0, A1, B0, B1] does not. I suspect the issue is that we're now using some different combination of We could (and should) optimize this path in merge to avoid eagerly loading data, but the immediate fix here is probably to make sure we're using concat instead of merge. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 | |
454086847 | https://github.com/pydata/xarray/issues/2662#issuecomment-454086847 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1NDA4Njg0Nw== | malmans2 22245117 | 2019-01-14T17:20:03Z | 2019-01-14T17:20:03Z | CONTRIBUTOR | I've created a little script to reproduce the problem.
@TomNicholas it looks like datasets are opened correctly. The problem arises when ```python import numpy as np import xarray as xr import os Tsize=100; T = np.arange(Tsize); Xsize=900; X = np.arange(Xsize); Ysize=800; Y = np.arange(Ysize) data = np.random.randn(Tsize, Xsize, Ysize) for i in range(2):
``` Fast if netCDFs are stored in one folder:
Slow if netCDFs are stored in several folders:
Fast if files containing different variables are opened separately, then merged:
|
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 | |
452462499 | https://github.com/pydata/xarray/issues/2662#issuecomment-452462499 | https://api.github.com/repos/pydata/xarray/issues/2662 | MDEyOklzc3VlQ29tbWVudDQ1MjQ2MjQ5OQ== | TomNicholas 35968931 | 2019-01-08T21:43:31Z | 2019-01-08T21:43:31Z | MEMBER | I'm not sure what might be causing this, but I wonder if you could help narrow it down a bit? Can you for example see if it's making it past here? That would at least tell us if it is opening each of the datasets okay. (Or even better: post some example datasets which will cause this problem?) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset in v.0.11.1 is very slow 397063221 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4