html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/463#issuecomment-263467311,https://api.github.com/repos/pydata/xarray/issues/463,263467311,MDEyOklzc3VlQ29tbWVudDI2MzQ2NzMxMQ==,306380,2016-11-29T03:35:43Z,2016-11-29T03:35:43Z,MEMBER,"@shoyer is it ever feasible to read the first NetCDF file in a sequence and assume that they are all the same except to increment a datetime dimension by increasing days? On Mon, Nov 28, 2016 at 7:19 PM, Stephan Hoyer wrote: > if I understand correctly, incorporation of the LRU cache could help with > this problem assuming time series were sliced into small chunks for access, > correct? We would still run into problems, however, if there were say 10^6 > files and we wanted to get a time-series spanning these files, right? > > The LRU cache solution proposed in #798 > would work in either case. > It just would have poor performance when accessing a small piece of each of > 10^6 files, both to build the graph (because xarray needs to open each file > to read the metadata) and to do the actual computation (again, because of > the need to open so many files). If you only need a small amount of data > from many files, you probably want to reshape your data to minimize the > amount of necessary file access no matter what, whether you do that > reshaping with PyReshaper or xarray/dask.array/dask-distributed. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498