issue_comments
10 rows where issue = 224553135 and user = 1197350 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- slow performance with open_mfdataset · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1043038150 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043038150 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-K3_G | rabernat 1197350 | 2022-02-17T14:57:03Z | 2022-02-17T14:57:03Z | MEMBER | See deeper dive in https://github.com/pydata/xarray/discussions/6284 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1043016100 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043016100 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Kymk | rabernat 1197350 | 2022-02-17T14:36:23Z | 2022-02-17T14:36:23Z | MEMBER | Ah ok so if that is your goal, There is a problem with the time encoding in this file. The units ( |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1043001146 | https://github.com/pydata/xarray/issues/1385#issuecomment-1043001146 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Ku86 | rabernat 1197350 | 2022-02-17T14:21:45Z | 2022-02-17T14:22:23Z | MEMBER |
In general that would be a little more convenient than google drive, because then we could download the file from python (rather than having a manual step). This would allow us to share a fully copy-pasteable code snippet to reproduce the issue. But don't worry about that for now. First, I'd note that your issue is not really related to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
1042937825 | https://github.com/pydata/xarray/issues/1385#issuecomment-1042937825 | https://api.github.com/repos/pydata/xarray/issues/1385 | IC_kwDOAMm_X84-Kffh | rabernat 1197350 | 2022-02-17T13:14:50Z | 2022-02-17T13:14:50Z | MEMBER | Hi Tom! 👋 So much has evolved about xarray since this original issue was posted. However, we continue to use it as a catchall for people looking to speed up open_mfdataset. I saw your stackoverflow post. Any chance you could post a link to the actual file in question? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
561920115 | https://github.com/pydata/xarray/issues/1385#issuecomment-561920115 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDU2MTkyMDExNQ== | rabernat 1197350 | 2019-12-05T01:09:25Z | 2019-12-05T01:09:25Z | MEMBER | In your twitter thread you said
The general reason for this is usually that First, all the files are opened individually https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L900-L903 You can recreate this step outside of xarray yourself by doing something like
Once each dataset is open, xarray calls out to one of its combine functions. This logic has gotten more complex over the years as different options have been introduced, but the gist is this: https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L947-L952 You can reproduce this step outside of xarray, e.g.
Without seeing more details about your files, it's hard to know exactly where the issue lies. A good place to start is to simply drop all coordinates from your data as a preprocessing step. ``` def drop_all_coords(ds): return ds.reset_coords(drop=True) xr.open_mfdataset('*.nc', combine='by_coords', preprocess=drop_all_coords) ``` If you observe a big speedup, this points at coordinate compatibility checks as the culprit. From there you can experiment with the various options for Once you post your file details, we can provide more concrete suggestions. |
{ "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
561915767 | https://github.com/pydata/xarray/issues/1385#issuecomment-561915767 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDU2MTkxNTc2Nw== | rabernat 1197350 | 2019-12-05T00:52:06Z | 2019-12-05T00:52:06Z | MEMBER | @keltonhalbert - I'm sorry you're frustrated by this issue. It's hard to provide a general answer to "why is open_mfdataset slow?" without seeing the data in question. I'll try to provide some best practices and recommendations here. In the meantime, could you please post the xarray repr of two of your files? To be explicit.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
463369751 | https://github.com/pydata/xarray/issues/1385#issuecomment-463369751 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MzM2OTc1MQ== | rabernat 1197350 | 2019-02-13T21:04:03Z | 2019-02-13T21:04:03Z | MEMBER | What if you do |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
371891466 | https://github.com/pydata/xarray/issues/1385#issuecomment-371891466 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MTg5MTQ2Ng== | rabernat 1197350 | 2018-03-09T17:53:15Z | 2018-03-09T17:53:15Z | MEMBER | Calling
Calling getitem on this array triggers the whole dask array to be computed, which would takes forever and would completely blow out the notebook memory. This is because of #1372, which would be fixed by #1725. This has actually become a major showstopper for me. I need to work with this dataset in decoded form. Versions
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 3.12.62-60.64.8-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.10.1
pandas: 0.22.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: 2.2.0a2.dev176
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.1
distributed: 1.21.3
matplotlib: 2.1.2
cartopy: 0.15.1
seaborn: 0.8.1
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.3.2
IPython: 6.2.1
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
370064483 | https://github.com/pydata/xarray/issues/1385#issuecomment-370064483 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDM3MDA2NDQ4Mw== | rabernat 1197350 | 2018-03-02T21:57:26Z | 2018-03-02T21:57:26Z | MEMBER | An update on this long-standing issue. I have learned that As an example, I am loading a POP datataset on cheyenne. Anyone with access can try this example. ```python base_dir = '/glade/scratch/rpa/' prefix = 'BRCP85C5CN_ne120_t12_pop62.c13b17.asdphys.001' code = 'pop.h.nday1.SST' glob_pattern = os.path.join(base_dir, prefix, '%s.%s.*.nc' % (prefix, code)) def non_time_coords(ds): return [v for v in ds.data_vars if 'time' not in ds[v].dims] def drop_non_essential_vars_pop(ds): return ds.drop(non_time_coords(ds)) this runs almost instantlyds = xr.open_mfdataset(glob_pattern, decode_times=False, chunks={'time': 1},
preprocess=drop_non_essential_vars_pop, decode_cf=False)
This is roughly 45 years of daily data, one file per year. Instead, if I just change There are more of these This is a real failure of lazy decoding. Maybe it can be fixed by #1725, possibly related to #1372. cc Pangeo folks: @jhamman, @mrocklin |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 2, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
297494539 | https://github.com/pydata/xarray/issues/1385#issuecomment-297494539 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDI5NzQ5NDUzOQ== | rabernat 1197350 | 2017-04-26T18:07:03Z | 2017-04-26T18:07:03Z | MEMBER | cc: @geosciz, who is helping with this project. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1