home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 263437709

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/463#issuecomment-263437709 https://api.github.com/repos/pydata/xarray/issues/463 263437709 MDEyOklzc3VlQ29tbWVudDI2MzQzNzcwOQ== 1217238 2016-11-29T00:19:53Z 2016-11-29T00:19:53Z MEMBER

if I understand correctly, incorporation of the LRU cache could help with this problem assuming time series were sliced into small chunks for access, correct? We would still run into problems, however, if there were say 10^6 files and we wanted to get a time-series spanning these files, right?

The LRU cache solution proposed in https://github.com/pydata/xarray/issues/798 would work in either case. It just would have poor performance when accessing a small piece of each of 10^6 files, both to build the graph (because xarray needs to open each file to read the metadata) and to do the actual computation (again, because of the need to open so many files). If you only need a small amount of data from many files, you probably want to reshape your data to minimize the amount of necessary file access no matter what, whether you do that reshaping with PyReshaper or xarray/dask.array/dask-distributed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  94328498
Powered by Datasette · Queries took 0.835ms · About: xarray-datasette