issues: 1916677049
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1916677049 | I_kwDOAMm_X85yPiu5 | 8245 | Tools for writing distributed zarrs | 5635139 | open | 0 | 0 | 2023-09-28T04:25:45Z | 2024-01-04T00:15:09Z | MEMBER | What is your issue?There seems to be a common pattern for writing zarrs from a distributed set of machines, in parallel. It's somewhat described in the prose of the io docs. Quoting:
I've been using this fairly successfully recently. It's much better than writing hundreds or thousands of data variables, since many small data variables create a huge number of files. Are there some tools we can provide to make this easier? Some ideas:
- [ ]
More minor papercuts:
- [ ] I've hit an issue where writing a region seemed to cause the worker to attempt to load the whole array into memory — can we offer guarantees for when (non-metadata) data will be loaded during Some things that were in the list here, as they've been completed!!
- [x] Requiring |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8245/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
13221727 | issue |