home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 492732052

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2954#issuecomment-492732052 https://api.github.com/repos/pydata/xarray/issues/2954 492732052 MDEyOklzc3VlQ29tbWVudDQ5MjczMjA1Mg== 1217238 2019-05-15T16:43:17Z 2019-05-15T16:43:17Z MEMBER

is not closing the file after it has been opened for retrieving a "lazy" file by design, or might this be considered a wart/bug?

You can achieve this behavior (nearly) by setting xarray.set_options(file_cache_maxsize=1).

Note that the default for file_cache_maxsize is 128, which is suspiciously similar to the number of files/groups at which you encounter issues. In theory we use appropriate locks for automatically closing files when the cache size is exceeded, but this may not be working properly. If you can make a test case with synthetic data (e.g., including a script to make files) I can see if I can reproduce/fix this.

But to clarify the intent here: we don't close files around every access to data because can cause a severe loss in performance, e.g., if you're using dask to read a bunch of chunks out of the same file.

I agree that it's unintuitive how we ignore the explicit context manager. Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed? It's not immediately obvious to me how to refactor the code to achieve this, but this does seem like it would make for a better user experience.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  442617907
Powered by Datasette · Queries took 0.576ms · About: xarray-datasette