home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 305506896

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/798#issuecomment-305506896 https://api.github.com/repos/pydata/xarray/issues/798 305506896 MDEyOklzc3VlQ29tbWVudDMwNTUwNjg5Ng== 306380 2017-06-01T14:17:11Z 2017-06-01T14:17:11Z MEMBER

@shoyer regarding per-file locking this probably only matters if we are writing as well, yes?

Here is a small implementation of a generic file-open cache. I haven't yet decided on a eviction policy but either LRU or random (filtered by closeable files) should work OK.

```python from contextlib import contextmanager import threading

class OpenCache(object): def init(self, maxsize=100): self.refcount = defaultdict(lambda: 0) self.maxsize = 0 self.cache = {} self.i = 0 self.lock = threading.Lock()

@contextmanager
def open(self, myopen, fn, mode='r'):
    assert 'r' in mode
    key = (myopen, fn, mode)
    with self.lock:
        try:
            file = self.cache[key]
        except KeyError:
            file = myopen(fn, mode=mode)
            self.cache[key] = file

        self.refcount[key] += 1

        if len(self.cache) > self.maxsize:
            # Clear old files intelligently

    try:
        yield file
    finally:
        with self.lock:
            self.refcount[key] -= 1

cache = OpenCache() with cache.open(h5py.File, 'myfile.hdf5') as f: x = f['/data/x'] y = x[:1000, :1000] ```

Is this still useful?

I'm curious to hear from users like @pwolfram and @rabernat who may be running into the many file problem about what the current pain points are.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  142498006
Powered by Datasette · Queries took 79.177ms · About: xarray-datasette