issue_comments: 199545836
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/798#issuecomment-199545836 | https://api.github.com/repos/pydata/xarray/issues/798 | 199545836 | MDEyOklzc3VlQ29tbWVudDE5OTU0NTgzNg== | 306380 | 2016-03-21T23:59:18Z | 2016-03-21T23:59:18Z | MEMBER | Copying over a comment from that issue: Yes, so the problem as I see it is that, for serialization and open-file reasons we want to use a function like the following:
However, this opens and closes many files, which while robust, is slow. We can alleviate this by maintaining an LRU cache in a global variable so that it is created separately per process. ``` python from toolz import memoize cache = LRUDict(size=100, on_eviction=lambda file: file.close()) netCDF4_Dataset = memoize(netCDF4.Dataset, cache=cache) def def get_chunk_of_array(filename, datapath, slice): f = netCDF4_Dataset(filename) return f.variables[datapath][slice] ``` I'm happy to supply the We would then need to use such a function within the dask.array and xarary codebases. Anyway, that's one approach. Thoughts welcome. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
142498006 |