issues: 304201107
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 304201107 | MDU6SXNzdWUzMDQyMDExMDc= | 1981 | use dask to open datasets in parallel | 2443309 | closed | 0 | 5 | 2018-03-11T22:33:52Z | 2018-04-20T12:04:23Z | 2018-04-20T12:04:23Z | MEMBER | Code Sample, a copy-pastable example if possible
Problem descriptionWe have many issues describing the less than stelar performance of open_mfdataset (e.g. #511, #893, #1385, #1788, #1823). The problem can be broken into three pieces: 1) open each file, 2) decode/preprocess each datasets, and 3) merge/combine/concat the collection of datasets. We can perform (1) and (2) in parallel (performance improvements to (3) would be a separate task). Lately, I'm finding that for large numbers of files, it can take many seconds to many minutes just to open all the files in a multi-file dataset of mine. I'm proposing that we use something like We could change the line:
I'm curious what others think of this idea and what the potential downfalls may be. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1981/reactions",
"total_count": 2,
"+1": 2,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | 13221727 | issue |