issue_comments: 199545836

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/798#issuecomment-199545836	https://api.github.com/repos/pydata/xarray/issues/798	199545836	MDEyOklzc3VlQ29tbWVudDE5OTU0NTgzNg==	306380	2016-03-21T23:59:18Z	2016-03-21T23:59:18Z	MEMBER	Copying over a comment from that issue: Yes, so the problem as I see it is that, for serialization and open-file reasons we want to use a function like the following: `python def get_chunk_of_array(filename, datapath, slice): with netCDF4.Dataset(filename) as f: return f.variables[datapath][slice]` However, this opens and closes many files, which while robust, is slow. We can alleviate this by maintaining an LRU cache in a global variable so that it is created separately per process. ``` python from toolz import memoize cache = LRUDict(size=100, on_eviction=lambda file: file.close()) netCDF4_Dataset = memoize(netCDF4.Dataset, cache=cache) def def get_chunk_of_array(filename, datapath, slice): f = netCDF4_Dataset(filename) return f.variables[datapath][slice] ``` I'm happy to supply the `memoize` function with `toolz` and an appropriate `LRUDict` object with other microprojects that I can publish if necessary. We would then need to use such a function within the dask.array and xarary codebases. Anyway, that's one approach. Thoughts welcome.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		142498006