issue_comments
21 rows where author_association = "MEMBER" and issue = 224553135 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- slow performance with open_mfdataset · 21 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1043038150 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043038150 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-K3_G | rabernat 1197350 | 2022-02-17T14:57:03Z | 2022-02-17T14:57:03Z | MEMBER | See deeper dive in https://github.com/pydata/xarray/discussions/6284 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1043016100 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043016100 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Kymk | rabernat 1197350 | 2022-02-17T14:36:23Z | 2022-02-17T14:36:23Z | MEMBER | Ah ok so if that is your goal, There is a problem with the time encoding in this file. The units ( |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1043001146 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043001146 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Ku86 | rabernat 1197350 | 2022-02-17T14:21:45Z | 2022-02-17T14:22:23Z | MEMBER |
In general that would be a little more convenient than google drive, because then we could download the file from python (rather than having a manual step). This would allow us to share a fully copy-pasteable code snippet to reproduce the issue. But don't worry about that for now. First, I'd note that your issue is not really related to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1042937825 | https://github.com/pydata/xarray/issues/1385#issuecomment-1042937825 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Kffh | rabernat 1197350 | 2022-02-17T13:14:50Z | 2022-02-17T13:14:50Z | MEMBER | Hi Tom! 👋 So much has evolved about xarray since this original issue was posted. However, we continue to use it as a catchall for people looking to speed up open_mfdataset. I saw your stackoverflow post. Any chance you could post a link to the actual file in question? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
756948150 | https://github.com/pydata/xarray/issues/1385#issuecomment-756948150 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDc1Njk0ODE1MA== | dcherian 2448579 | 2021-01-08T19:20:51Z | 2021-01-08T19:20:51Z | MEMBER |
This is important! Otherwise that timing scales with number of files. If you get that to work, then you can convert to a dask dataframe and keep things lazy. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
756830228 | https://github.com/pydata/xarray/issues/1385#issuecomment-756830228 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDc1NjgzMDIyOA== | dcherian 2448579 | 2021-01-08T15:52:49Z | 2021-01-08T15:53:22Z | MEMBER | @jameshalgren A lot of these issues have been fixed. Have you tried the advice here: https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets? If not, a reproducible example would help (I have access to Cheyenne). Let's also move this conversation to the "Discussions" forum: https://github.com/pydata/xarray/discussions |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
685164626 | https://github.com/pydata/xarray/issues/1385#issuecomment-685164626 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDY4NTE2NDYyNg== | dcherian 2448579 | 2020-09-01T22:21:58Z | 2020-09-01T22:21:58Z | MEMBER | This is the most up-to-date documentation on this issue: https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
561920115 | https://github.com/pydata/xarray/issues/1385#issuecomment-561920115 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDU2MTkyMDExNQ== | rabernat 1197350 | 2019-12-05T01:09:25Z | 2019-12-05T01:09:25Z | MEMBER | In your twitter thread you said
The general reason for this is usually that First, all the files are opened individually https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L900-L903 You can recreate this step outside of xarray yourself by doing something like
Once each dataset is open, xarray calls out to one of its combine functions. This logic has gotten more complex over the years as different options have been introduced, but the gist is this: https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L947-L952 You can reproduce this step outside of xarray, e.g.
Without seeing more details about your files, it's hard to know exactly where the issue lies. A good place to start is to simply drop all coordinates from your data as a preprocessing step. ``` def drop_all_coords(ds): return ds.reset_coords(drop=True) xr.open_mfdataset('*.nc', combine='by_coords', preprocess=drop_all_coords) ``` If you observe a big speedup, this points at coordinate compatibility checks as the culprit. From there you can experiment with the various options for Once you post your file details, we can provide more concrete suggestions. |
{ "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
561915767 | https://github.com/pydata/xarray/issues/1385#issuecomment-561915767 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDU2MTkxNTc2Nw== | rabernat 1197350 | 2019-12-05T00:52:06Z | 2019-12-05T00:52:06Z | MEMBER | @keltonhalbert - I'm sorry you're frustrated by this issue. It's hard to provide a general answer to "why is open_mfdataset slow?" without seeing the data in question. I'll try to provide some best practices and recommendations here. In the meantime, could you please post the xarray repr of two of your files? To be explicit.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
463369751 | https://github.com/pydata/xarray/issues/1385#issuecomment-463369751 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MzM2OTc1MQ== | rabernat 1197350 | 2019-02-13T21:04:03Z | 2019-02-13T21:04:03Z | MEMBER | What if you do |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
461554066 | https://github.com/pydata/xarray/issues/1385#issuecomment-461554066 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MTU1NDA2Ng== | TomNicholas 35968931 | 2019-02-07T19:00:57Z | 2019-02-07T19:00:57Z | MEMBER | Looks like you're using xarray v0.11.0, but the most recent one is v0.11.3. There have been several changes since then which might affect this, try that first. On Thu, 7 Feb 2019, 18:53 sbiner, notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
439454213 | https://github.com/pydata/xarray/issues/1385#issuecomment-439454213 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzOTQ1NDIxMw== | shoyer 1217238 | 2018-11-16T16:46:55Z | 2018-11-16T16:46:55Z | MEMBER | Does it take 10 seconds even to open a single file? The big mystery is what that top line ("_operator.getitem") is but my guess is it's netCDF4-python. h5netcdf might also give different results... On Fri, Nov 16, 2018 at 8:20 AM chuaxr notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
439263419 | https://github.com/pydata/xarray/issues/1385#issuecomment-439263419 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzOTI2MzQxOQ== | shoyer 1217238 | 2018-11-16T02:45:05Z | 2018-11-16T02:45:05Z | MEMBER | @chuaxr What do you see when you use One way to fix this would be to move our call to In practice, is the difference between using xarray's internal lazy array classes for decoding and dask for decoding. I would expect to see small differences in performance between these approaches (especially when actually computing data), but for constructing the computation graph I would expect them to have similar performance. It is puzzling that dask is orders of magnitude faster -- that suggests that something else is going wrong in the normal code path for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
438873285 | https://github.com/pydata/xarray/issues/1385#issuecomment-438873285 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzODg3MzI4NQ== | shoyer 1217238 | 2018-11-15T00:45:53Z | 2018-11-15T00:45:53Z | MEMBER | @chuaxr I assume you're testing this with xarray 0.11? It would be good to do some profiling to figure out what is going wrong here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
437630511 | https://github.com/pydata/xarray/issues/1385#issuecomment-437630511 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzNzYzMDUxMQ== | shoyer 1217238 | 2018-11-10T23:38:10Z | 2018-11-10T23:38:10Z | MEMBER | Was this fixed by https://github.com/pydata/xarray/pull/2047? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
371933603 | https://github.com/pydata/xarray/issues/1385#issuecomment-371933603 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MTkzMzYwMw== | shoyer 1217238 | 2018-03-09T20:17:19Z | 2018-03-09T20:17:19Z | MEMBER | OK, so it seems that we need a change to disable wrapping dask arrays with |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
371891466 | https://github.com/pydata/xarray/issues/1385#issuecomment-371891466 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MTg5MTQ2Ng== | rabernat 1197350 | 2018-03-09T17:53:15Z | 2018-03-09T17:53:15Z | MEMBER | Calling
Calling getitem on this array triggers the whole dask array to be computed, which would takes forever and would completely blow out the notebook memory. This is because of #1372, which would be fixed by #1725. This has actually become a major showstopper for me. I need to work with this dataset in decoded form. Versions
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.12.62-60.64.8-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.1
pandas: 0.22.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: 2.2.0a2.dev176
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.1
distributed: 1.21.3
matplotlib: 2.1.2
cartopy: 0.15.1
seaborn: 0.8.1
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.3.2
IPython: 6.2.1
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
370092011 | https://github.com/pydata/xarray/issues/1385#issuecomment-370092011 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MDA5MjAxMQ== | shoyer 1217238 | 2018-03-02T23:58:26Z | 2018-03-02T23:58:26Z | MEMBER | @rabernat How does performance compare if you call |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
370064483 | https://github.com/pydata/xarray/issues/1385#issuecomment-370064483 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MDA2NDQ4Mw== | rabernat 1197350 | 2018-03-02T21:57:26Z | 2018-03-02T21:57:26Z | MEMBER | An update on this long-standing issue. I have learned that As an example, I am loading a POP datataset on cheyenne. Anyone with access can try this example. ```python base_dir = '/glade/scratch/rpa/' prefix = 'BRCP85C5CN_ne120_t12_pop62.c13b17.asdphys.001' code = 'pop.h.nday1.SST' glob_pattern = os.path.join(base_dir, prefix, '%s.%s.*.nc' % (prefix, code)) def non_time_coords(ds): return [v for v in ds.data_vars if 'time' not in ds[v].dims] def drop_non_essential_vars_pop(ds): return ds.drop(non_time_coords(ds)) this runs almost instantlyds = xr.open_mfdataset(glob_pattern, decode_times=False, chunks={'time': 1},
preprocess=drop_non_essential_vars_pop, decode_cf=False)
This is roughly 45 years of daily data, one file per year. Instead, if I just change There are more of these This is a real failure of lazy decoding. Maybe it can be fixed by #1725, possibly related to #1372. cc Pangeo folks: @jhamman, @mrocklin |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 2, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
297539517 | https://github.com/pydata/xarray/issues/1385#issuecomment-297539517 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDI5NzUzOTUxNw== | shoyer 1217238 | 2017-04-26T20:59:23Z | 2017-04-26T20:59:23Z | MEMBER |
Yes, adding an boolean argument But more generally, I am a little surprised by how slow Basically, if |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
297494539 | https://github.com/pydata/xarray/issues/1385#issuecomment-297494539 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDI5NzQ5NDUzOQ== | rabernat 1197350 | 2017-04-26T18:07:03Z | 2017-04-26T18:07:03Z | MEMBER | cc: @geosciz, who is helping with this project. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4