issue_comments: 561920115
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/1385#issuecomment-561920115 | https://api.github.com/repos/pydata/xarray/issues/1385 | 561920115 | MDEyOklzc3VlQ29tbWVudDU2MTkyMDExNQ== | 1197350 | 2019-12-05T01:09:25Z | 2019-12-05T01:09:25Z | MEMBER | In your twitter thread you said
The general reason for this is usually that First, all the files are opened individually https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L900-L903 You can recreate this step outside of xarray yourself by doing something like
Once each dataset is open, xarray calls out to one of its combine functions. This logic has gotten more complex over the years as different options have been introduced, but the gist is this: https://github.com/pydata/xarray/blob/577d3a75ea8bb25b99f9d31af8da14210cddff78/xarray/backends/api.py#L947-L952 You can reproduce this step outside of xarray, e.g.
Without seeing more details about your files, it's hard to know exactly where the issue lies. A good place to start is to simply drop all coordinates from your data as a preprocessing step. ``` def drop_all_coords(ds): return ds.reset_coords(drop=True) xr.open_mfdataset('*.nc', combine='by_coords', preprocess=drop_all_coords) ``` If you observe a big speedup, this points at coordinate compatibility checks as the culprit. From there you can experiment with the various options for Once you post your file details, we can provide more concrete suggestions. |
{ "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
224553135 |