html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/463#issuecomment-288867744,https://api.github.com/repos/pydata/xarray/issues/463,288867744,MDEyOklzc3VlQ29tbWVudDI4ODg2Nzc0NA==,4295853,2017-03-23T21:36:07Z,2017-03-23T21:36:07Z,CONTRIBUTOR,@ajoros should correct me if I'm wrong but it sounds like everything is working for his use case.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288832707,https://api.github.com/repos/pydata/xarray/issues/463,288832707,MDEyOklzc3VlQ29tbWVudDI4ODgzMjcwNw==,4295853,2017-03-23T19:21:57Z,2017-03-23T19:21:57Z,CONTRIBUTOR,"@ajoros, #1198 was just merged so the bleeding-edge version of xarray is the one to try!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288830741,https://api.github.com/repos/pydata/xarray/issues/463,288830741,MDEyOklzc3VlQ29tbWVudDI4ODgzMDc0MQ==,4295853,2017-03-23T19:14:23Z,2017-03-23T19:14:23Z,CONTRIBUTOR,"@ajoros, can you try something like `pip -v install --force git+ssh://git@github.com/pwolfram/xarray@fix_too_many_open_files` to see if #1198 fixes your problem with your dataset, noting that you need `open_mfdataset(..., autoclose=True)`?
@shoyer should correct me if I'm wrong but we are almost ready to merge the code in this PR and this would be a great ""in the field"" check if you could try it out soon.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288414991,https://api.github.com/repos/pydata/xarray/issues/463,288414991,MDEyOklzc3VlQ29tbWVudDI4ODQxNDk5MQ==,4295853,2017-03-22T14:25:37Z,2017-03-22T14:25:37Z,CONTRIBUTOR,We are very close on #1198 and will be merging soon. This would be a great time for everyone to ensure that #1198 resolves this issue before we merge.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263723460,https://api.github.com/repos/pydata/xarray/issues/463,263723460,MDEyOklzc3VlQ29tbWVudDI2MzcyMzQ2MA==,4295853,2016-11-29T22:39:25Z,2016-11-29T23:30:59Z,CONTRIBUTOR,I just realized I didn't say thank you to @shoyer et al for the advice and help. Please forgive my rudeness.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263721589,https://api.github.com/repos/pydata/xarray/issues/463,263721589,MDEyOklzc3VlQ29tbWVudDI2MzcyMTU4OQ==,4295853,2016-11-29T22:31:25Z,2016-11-29T22:31:25Z,CONTRIBUTOR,"@shoyer, if I understand correctly the best approach as you see it to build on `opener` via #1128, recognizing this will be essentially ""upgraded"" sometime in the future, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263693540,https://api.github.com/repos/pydata/xarray/issues/463,263693540,MDEyOklzc3VlQ29tbWVudDI2MzY5MzU0MA==,4295853,2016-11-29T20:46:20Z,2016-11-29T20:47:30Z,CONTRIBUTOR,"@shoyer, you probably have the very best feel for what the most efficacious solution is to this problem in terms of fixing the issue, performance, longer utility, etc. Is there any clear winner from the following potentially non-exhaustive options?
1. LRU cache from #798
2. Building on `opener` #1128
3. New wrapper functionality as discussed above for [NcML](http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/Aggregation.html)
4. Use of [PyReshaper](https://github.com/NCAR/PyReshaper) (e.g., short term acknowledgement that change to xarray / dask may be somewhat out of scope for current design goals)
My current analysis:
I could see our team using PyReshaper because our data output format already has inertia but this adds complexity to a workflow that intuitively should be handled inside xarray. However, I think we want to get around the file number limitation eventually because it is an issue that multiple groups keep bringing up. This is perhaps the simplest solution but it is specific to our uses and not necessarily general. Towards a general solution, we would intuitively have a fixed cost performance penalty for the `opener` solution but it may be the simplest and cleanest approach, at least for the short term. However, we may need the LRU cache eventually to bridge xarray / dask-distributed so implementation of `opener` could be a depreciated effort in the long term. The NcML approach has the flavor of a solution along the lines of PyReshaper, although my limited experience with PyReshaper and NcML precludes a more rigorous analysis. We can follow up with @kmpaul on this point if it would be helpful moving forward. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263418422,https://api.github.com/repos/pydata/xarray/issues/463,263418422,MDEyOklzc3VlQ29tbWVudDI2MzQxODQyMg==,4295853,2016-11-28T22:42:55Z,2016-11-28T22:43:32Z,CONTRIBUTOR,"We (+ @milenaveneziani and @xylar) are running into this issue again. Ideally, this should be resolved and after following up with everyone on strategy I may have another look at this issue if it sounds straightforward to fix.
@shoyer and @mrocklin, if I understand correctly, incorporation of the LRU cache could help with this problem assuming time series were sliced into small chunks for access, correct? We would still run into problems, however, if there were say 10^6 files and we wanted to get a time-series spanning these files, right? If so, we may need a more robust solution than just the LRU cache. In the short term, PyReshaper may provide a temporary solution for us. cc @kmpaul to provide some perspective here too regarding use of https://github.com/NCAR/PyReshaper.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498