home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 199545836

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/798#issuecomment-199545836 https://api.github.com/repos/pydata/xarray/issues/798 199545836 MDEyOklzc3VlQ29tbWVudDE5OTU0NTgzNg== 306380 2016-03-21T23:59:18Z 2016-03-21T23:59:18Z MEMBER

Copying over a comment from that issue:

Yes, so the problem as I see it is that, for serialization and open-file reasons we want to use a function like the following:

python def get_chunk_of_array(filename, datapath, slice): with netCDF4.Dataset(filename) as f: return f.variables[datapath][slice]

However, this opens and closes many files, which while robust, is slow. We can alleviate this by maintaining an LRU cache in a global variable so that it is created separately per process.

``` python from toolz import memoize

cache = LRUDict(size=100, on_eviction=lambda file: file.close())

netCDF4_Dataset = memoize(netCDF4.Dataset, cache=cache)

def def get_chunk_of_array(filename, datapath, slice): f = netCDF4_Dataset(filename) return f.variables[datapath][slice] ```

I'm happy to supply the memoize function with toolz and an appropriate LRUDict object with other microprojects that I can publish if necessary.

We would then need to use such a function within the dask.array and xarary codebases.

Anyway, that's one approach. Thoughts welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  142498006
Powered by Datasette · Queries took 77.11ms · About: xarray-datasette