html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/798#issuecomment-288414396,https://api.github.com/repos/pydata/xarray/issues/798,288414396,MDEyOklzc3VlQ29tbWVudDI4ODQxNDM5Ng==,4295853,2017-03-22T14:23:45Z,2017-03-22T14:23:45Z,CONTRIBUTOR,"@mrocklin and @shoyer, we now have dask.distributed and xarray support. Should this issue be closed?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-255192875,https://api.github.com/repos/pydata/xarray/issues/798,255192875,MDEyOklzc3VlQ29tbWVudDI1NTE5Mjg3NQ==,4295853,2016-10-20T18:44:03Z,2016-10-20T18:44:03Z,CONTRIBUTOR,"@mrocklin, I would be happy to chat because I am interested in seeing this happen (e.g., eventually contributing code). The question is whether we need additional expertise from @shoyer, @jhamman, @rabernat etc who likely have a greater in-depth understanding of xarray than me. Perhaps this warrants an email to the wider list? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-255188697,https://api.github.com/repos/pydata/xarray/issues/798,255188697,MDEyOklzc3VlQ29tbWVudDI1NTE4ODY5Nw==,4295853,2016-10-20T18:28:24Z,2016-10-20T18:28:24Z,CONTRIBUTOR,"@kynan, I'm still interested in this but have not had time to advance this further. Are you interested in contributing to this too? I view this as a key component of future climate analysis workflows. This may also be something that is addressed at the upcoming hackathon at Columbia with @rabernat early next month. Also, I suspect that both @mrocklin and @shoyer would be willing to continue to provide key advice because this appears to be aligned with their interests too (please correct me if I'm wrong in this assessment). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-205481557,https://api.github.com/repos/pydata/xarray/issues/798,205481557,MDEyOklzc3VlQ29tbWVudDIwNTQ4MTU1Nw==,4295853,2016-04-04T20:32:23Z,2016-04-04T20:32:23Z,CONTRIBUTOR,"@shoyer, if if we are happy to open all netCDF files and read out the metadata from a master process that would imply that we would open a file, read the metadata, and then close it, correct? Array access should then follow something like the @mrocklin's `netcdf_Dataset` approach, right? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-205481269,https://api.github.com/repos/pydata/xarray/issues/798,205481269,MDEyOklzc3VlQ29tbWVudDIwNTQ4MTI2OQ==,4295853,2016-04-04T20:31:24Z,2016-04-04T20:31:24Z,CONTRIBUTOR,"@fmaussion, for 1. The LRU cache should be used serially for the read initially, but something more like @mrocklin's `netcdf_Dataset` appears to be needed as @shoyer points out. I need to think about this more. 2. I was thinking we would keep track of the file name outside the LRU and only use the filename to open up datasets inside the LRU if they aren't already open. Agreed that `if file in LRU` should designate whether the file is open. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-205478991,https://api.github.com/repos/pydata/xarray/issues/798,205478991,MDEyOklzc3VlQ29tbWVudDIwNTQ3ODk5MQ==,4295853,2016-04-04T20:24:41Z,2016-04-04T20:24:41Z,CONTRIBUTOR,"Just to be clear, we are talking about this https://github.com/mrocklin/hdf5lazy/blob/master/hdf5lazy/core.py#L83 for @mrocklin's `netcdf_Dataset`, right? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-205133433,https://api.github.com/repos/pydata/xarray/issues/798,205133433,MDEyOklzc3VlQ29tbWVudDIwNTEzMzQzMw==,4295853,2016-04-04T04:35:09Z,2016-04-04T04:35:09Z,CONTRIBUTOR,"Thanks @mrocklin! This has been really helpful and was what I needed to get going. A prelim design I'm seeing is to modify the `NetCDF4DataStore` class https://github.com/pydata/xarray/blob/master/xarray/backends/netCDF4_.py#L170 to meet these requirements: 1. At `__init__`, try to open file via the LRU cache. I think the LRU dict has to be a global because because the file restriction is an attribute of the system, correct? 2. For each read from a file, ensure it hasn't been closed via a `@ds.getter` property method. If so, reopen it via the LRU cache. This is ok because for a read the file is essentially read-only. The LRU closes out stale entries to prevent the too many open file errors. Checking this should be fast. 3. `sync` is only for a write but seems like it should following the above approach. A clean way to do this is just to make sure that each time `self.ds` is called, it is re-validated via the LRU cache. This should be able to be implemented via property getter methods https://docs.python.org/2/library/functions.html#property. Unless I'm missing something big, I don't think this change will require at large refactor but it is quite possible I overlooked something important. @shoyer and @mrocklin, do you see any obvious pitfalls in this scope? If not, it shouldn't be too hard to implement. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-204770198,https://api.github.com/repos/pydata/xarray/issues/798,204770198,MDEyOklzc3VlQ29tbWVudDIwNDc3MDE5OA==,4295853,2016-04-02T18:25:18Z,2016-04-02T18:25:18Z,CONTRIBUTOR,"Another note in support of this PR, especially ""robustly support HDF/NetCDF reads"": I am having problems with `NetCDF: HDF error` as previously reported by @rabernat in https://github.com/pydata/xarray/issues/463. Thus, a solution here will save time and may arguably be on the critical path of some workflows because fewer jobs will fail and require baby-sitting/restarts, especially when dealing with running multiple jobs. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-202696169,https://api.github.com/repos/pydata/xarray/issues/798,202696169,MDEyOklzc3VlQ29tbWVudDIwMjY5NjE2OQ==,4295853,2016-03-29T03:49:11Z,2016-03-29T14:24:02Z,CONTRIBUTOR,"Thanks @shoyer. If you can provide some guidance on bounds for the reorganization that would be really great. I want your and @jhamman's feedback on this before I try a solution. The trick is just to make the time, as always, and I may have some time this coming weekend. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-200878845,https://api.github.com/repos/pydata/xarray/issues/798,200878845,MDEyOklzc3VlQ29tbWVudDIwMDg3ODg0NQ==,4295853,2016-03-24T15:09:42Z,2016-03-24T15:13:18Z,CONTRIBUTOR,"This issue of connecting to dask/distributed may also be connected with https://github.com/pydata/xarray/issues/463, https://github.com/pydata/xarray/issues/591, and https://github.com/pydata/xarray/pull/524. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-200633312,https://api.github.com/repos/pydata/xarray/issues/798,200633312,MDEyOklzc3VlQ29tbWVudDIwMDYzMzMxMg==,4295853,2016-03-24T03:04:25Z,2016-03-24T03:04:25Z,CONTRIBUTOR,"Repeating @mrocklin: > Dask.array writes data to any object that supports numpy style setitem syntax like the following: > > dataset[my_slice] = my_numpy_array > > Objects like h5py.Dataset and netcdf objects support this syntax. > > So dask.array would work today without modification if we had such an object that represented many netcdf files at once and supported numpy-style setitem syntax, placing the numpy array properly across the right files. This work could happen easily without deep knowledge of either project. > > Alternatively, we could make the dask.array.store function optionally lazy so that users (or xarray) could call store many times before triggering execution. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-200632521,https://api.github.com/repos/pydata/xarray/issues/798,200632521,MDEyOklzc3VlQ29tbWVudDIwMDYzMjUyMQ==,4295853,2016-03-24T03:02:19Z,2016-03-24T03:02:19Z,CONTRIBUTOR,"@shoyer and @mrocklin, I've updated the summary above in the PR description with a to do list. Do either of you see any obvious tasks I missed on the list in the PR description? If so, can you please update the to do list so that I can see what needs done to modify the backend for the dask/distributed integration? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-199547374,https://api.github.com/repos/pydata/xarray/issues/798,199547374,MDEyOklzc3VlQ29tbWVudDE5OTU0NzM3NA==,4295853,2016-03-22T00:01:55Z,2016-03-22T00:02:11Z,CONTRIBUTOR,"Here is an example of a use case for a `nanmean` over ensembles in collaboration with @mrocklin and following http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3: https://gist.github.com/mrocklin/566a8d5c3f6721abf36f ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-199544731,https://api.github.com/repos/pydata/xarray/issues/798,199544731,MDEyOklzc3VlQ29tbWVudDE5OTU0NDczMQ==,4295853,2016-03-21T23:57:27Z,2016-03-21T23:57:27Z,CONTRIBUTOR,"See also https://github.com/dask/dask/issues/922 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006 https://github.com/pydata/xarray/issues/798#issuecomment-199532452,https://api.github.com/repos/pydata/xarray/issues/798,199532452,MDEyOklzc3VlQ29tbWVudDE5OTUzMjQ1Mg==,4295853,2016-03-21T23:21:07Z,2016-03-21T23:21:07Z,CONTRIBUTOR,"The full mailing list discussion is at https://groups.google.com/d/msgid/xarray/CAJ8oX-E7Xx6NT4F6J8B4__Q-kBazoob9_qe_oFLi5hany9-%3DKQ%40mail.gmail.com?utm_medium=email&utm_source=footer ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,142498006