html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/2261#issuecomment-428039712,https://api.github.com/repos/pydata/xarray/issues/2261,428039712,MDEyOklzc3VlQ29tbWVudDQyODAzOTcxMg==,1217238,2018-10-09T02:34:23Z,2018-10-09T02:34:23Z,MEMBER,"Yep, that's my plan. I just did a read through code again and identified a few unreachable lines, which I removed. I'll merge when CI passes.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-425047446,https://api.github.com/repos/pydata/xarray/issues/2261,425047446,MDEyOklzc3VlQ29tbWVudDQyNTA0NzQ0Ng==,1217238,2018-09-27T10:54:32Z,2018-09-27T10:54:32Z,MEMBER,"At some point soon I'm just going to merge this, more review or not! Hopefully a release candidate will catch any major issues.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-424945880,https://api.github.com/repos/pydata/xarray/issues/2261,424945880,MDEyOklzc3VlQ29tbWVudDQyNDk0NTg4MA==,1217238,2018-09-27T03:21:10Z,2018-09-27T03:21:10Z,MEMBER,I'd love to move this forward. I think it will fix some serious usability and performance issues with distributed reads/writes of netCDF files.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-419190187,https://api.github.com/repos/pydata/xarray/issues/2261,419190187,MDEyOklzc3VlQ29tbWVudDQxOTE5MDE4Nw==,1217238,2018-09-06T18:10:24Z,2018-09-06T18:10:24Z,MEMBER,"Here are the latest benchmarking numbers. I added a netCDF4 write benchmark based on https://github.com/pydata/xarray/issues/2389, both with and without dask-distributed:
```
178.39ms    before     after       ratio
  [66a8f8dd] [2a5d1f02]
+  549.24ms   604.32ms      1.10  dataset_io.IOReadMultipleNetCDF4.time_load_dataset_netcdf4
-  418.40ms   377.16ms      0.90  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_time_chunks
-     1.02s   905.48ms      0.89  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_scipy_with_time_chunks
-  443.48ms   384.74ms      0.87  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks
-  200.77ms   170.49ms      0.85  dataset_io.IOWriteMultipleNetCDF3.time_write_dataset_scipy
-     1.37s      1.12s      0.82  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks
-   21.63ms    17.69ms      0.82  dataset_io.IOReadSingleNetCDF3.time_vectorized_indexing
-  127.82ms    97.88ms      0.77  dataset_io.IOReadSingleNetCDF3.time_load_dataset_scipy
-   25.56ms    19.11ms      0.75  dataset_io.IOReadSingleNetCDF3.time_orthogonal_indexing
-  185.24ms   135.08ms      0.73  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks_vindexing
-  178.39ms   122.56ms      0.69  dataset_io.IOWriteSingleNetCDF3.time_write_dataset_scipy
-  108.35ms    65.82ms      0.61  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_time_chunks
-  109.67ms    65.99ms      0.60  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_block_chunks
-  107.91ms    64.50ms      0.60  dataset_io.IOReadMultipleNetCDF3.time_open_dataset_scipy
-  801.03ms   462.24ms      0.58  dataset_io.IOReadMultipleNetCDF3.time_load_dataset_scipy
-     3.14s      1.64s      0.52  dataset_io.IOWriteNetCDFDaskDistributed.time_write
-  547.06ms   204.31ms      0.37  dataset_io.IOWriteNetCDFDask.time_write
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-416748490,https://api.github.com/repos/pydata/xarray/issues/2261,416748490,MDEyOklzc3VlQ29tbWVudDQxNjc0ODQ5MA==,1217238,2018-08-28T21:34:21Z,2018-08-28T21:34:21Z,MEMBER,+1 on a release candidate. That's part of why I was thinking of using this as an excuse for the 0.11 release.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-416443277,https://api.github.com/repos/pydata/xarray/issues/2261,416443277,MDEyOklzc3VlQ29tbWVudDQxNjQ0MzI3Nw==,1217238,2018-08-28T03:57:54Z,2018-08-28T03:57:54Z,MEMBER,"I just ran the benchmark suite again and now see improvement across the board:
```
    before     after       ratio
  [0b9ab2d1] [6350ca6f]
-     1.49s      1.35s      0.91  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_block_chunks_multiprocessing
-   79.96ms    72.36ms      0.90  dataset_io.IOReadSingleNetCDF3.time_load_dataset_netcdf4
-   29.61ms    26.17ms      0.88  dataset_io.IOReadSingleNetCDF3.time_orthogonal_indexing
-  238.97ms   210.33ms      0.88  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_netcdf4_with_time_chunks
-  154.84ms   133.97ms      0.87  dataset_io.IOReadSingleNetCDF4Dask.time_load_dataset_netcdf4_with_time_chunks
-     3.03s      2.56s      0.85  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks_oindexing
-  458.85ms   377.81ms      0.82  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks
-   21.95ms    17.83ms      0.81  dataset_io.IOReadSingleNetCDF3.time_vectorized_indexing
-   63.52ms    51.54ms      0.81  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_time_chunks
-   79.17ms    63.31ms      0.80  dataset_io.IOReadMultipleNetCDF4.time_open_dataset_netcdf4
-   75.62ms    59.49ms      0.79  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_block_chunks_multiprocessing
-  650.58ms   502.08ms      0.77  dataset_io.IOReadMultipleNetCDF4.time_load_dataset_netcdf4
-   75.90ms    58.50ms      0.77  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_time_chunks_multiprocessing
-  687.07ms   527.76ms      0.77  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_block_chunks
-   65.15ms    49.77ms      0.76  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_block_chunks
-   86.80ms    65.68ms      0.76  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_block_chunks
-   58.60ms    43.81ms      0.75  dataset_io.IOReadMultipleNetCDF3.time_open_dataset_netcdf4
-     1.43s      1.07s      0.75  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_netcdf4_with_block_chunks_multiprocessing
-   80.01ms    57.88ms      0.72  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_time_chunks
-     1.16s   834.07ms      0.72  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_netcdf4_with_time_chunks_multiprocessing
-  177.43ms   126.31ms      0.71  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks_vindexing
-  135.28ms    93.70ms      0.69  dataset_io.IOReadSingleNetCDF3.time_load_dataset_scipy
-   62.89ms    43.38ms      0.69  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_time_chunks_multiprocessing
-   77.04ms    52.70ms      0.68  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_block_chunks_multiprocessing
-  324.10ms   221.52ms      0.68  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_time_chunks
-     1.28s   812.88ms      0.63  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_scipy_with_time_chunks
-  797.18ms   503.38ms      0.63  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_netcdf4_with_block_chunks
-     1.66s      1.04s      0.63  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks
-   98.57ms    56.60ms      0.57  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_block_chunks
-   98.12ms    54.05ms      0.55  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_time_chunks
-  810.75ms   436.98ms      0.54  dataset_io.IOReadMultipleNetCDF3.time_load_dataset_scipy
-  105.06ms    50.71ms      0.48  dataset_io.IOReadMultipleNetCDF3.time_open_dataset_scipy
-  608.23ms   231.53ms      0.38  dataset_io.IOReadMultipleNetCDF3.time_load_dataset_netcdf4
```
There's pretty clearly high-variance on this benchmarks.

I considered adding another benchmark with dask-distributed, but the numbers look very similar to those for multi-processing or threads. It doesn't seem to provide a useful additional signal and makes the whole IO benchmarking suite run about 30% slower to add the distributed tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-415082921,https://api.github.com/repos/pydata/xarray/issues/2261,415082921,MDEyOklzc3VlQ29tbWVudDQxNTA4MjkyMQ==,1217238,2018-08-22T15:54:09Z,2018-08-22T15:54:09Z,MEMBER,"ASV benchmark results for the dataset io tests (created with `asv continuous -f 1.1 -E conda-py3.6-bottleneck-dask-netcdf4-numpy-pandas-scipy upstream/master HEAD -b dataset_io`):
```
193.14ms    before     after       ratio
  [0b9ab2d1] [6350ca6f]
+   56.76ms    68.17ms      1.20  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_block_chunks_multiprocessing
+  367.54ms   426.44ms      1.16  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_netcdf4_with_time_chunks_multiprocessing
-   83.44ms    74.41ms      0.89  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_time_chunks
-     1.39s      1.24s      0.89  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_block_chunks_multiprocessing
-     3.19s      2.83s      0.89  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks_oindexing
-  435.48ms   384.63ms      0.88  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_time_chunks
-   65.69ms    57.92ms      0.88  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_block_chunks
-     1.08s   931.72ms      0.86  dataset_io.IOReadMultipleNetCDF3Dask.time_load_dataset_scipy_with_time_chunks
-     1.09s   938.76ms      0.86  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_time_chunks_multiprocessing
-  190.29ms   160.28ms      0.84  dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_scipy_with_block_chunks_vindexing
-  274.98ms   229.61ms      0.83  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_time_chunks
-  102.61ms    83.37ms      0.81  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_block_chunks
-   88.31ms    70.94ms      0.80  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_time_chunks_multiprocessing
-  595.90ms   476.80ms      0.80  dataset_io.IOReadMultipleNetCDF4.time_load_dataset_netcdf4
-  683.88ms   546.81ms      0.80  dataset_io.IOReadMultipleNetCDF4Dask.time_load_dataset_netcdf4_with_block_chunks
-   83.22ms    63.94ms      0.77  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_block_chunks_multiprocessing
-   81.28ms    60.12ms      0.74  dataset_io.IOReadMultipleNetCDF4Dask.time_open_dataset_netcdf4_with_block_chunks
-  134.39ms    99.27ms      0.74  dataset_io.IOReadSingleNetCDF3.time_load_dataset_scipy
-  100.06ms    63.71ms      0.64  dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_scipy_with_time_chunks
-  811.28ms   499.11ms      0.62  dataset_io.IOReadMultipleNetCDF3.time_load_dataset_scipy
-   29.88ms    18.12ms      0.61  dataset_io.IOReadSingleNetCDF3.time_orthogonal_indexing
-  102.23ms    59.25ms      0.58  dataset_io.IOReadMultipleNetCDF3.time_open_dataset_scipy
```

Most of the changed benchmarks have improved by ~20%, with the exceptions of `dataset_io.IOReadMultipleNetCDF3Dask.time_open_dataset_netcdf4_with_block_chunks_multiprocessing` and `dataset_io.IOReadSingleNetCDF3Dask.time_load_dataset_netcdf4_with_time_chunks_multiprocessing`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-414389492,https://api.github.com/repos/pydata/xarray/issues/2261,414389492,MDEyOklzc3VlQ29tbWVudDQxNDM4OTQ5Mg==,1217238,2018-08-20T17:01:38Z,2018-08-20T17:01:38Z,MEMBER,"This is ready for further review and testing. Things are working for writes with dask-distributed, including with h5netcdf (requires the 0.6.2 release of h5netcdf) and on Windows (https://github.com/pydata/xarray/issues/1738).

Follow-ups for future work:
- I managed to work around the need for a reentrant lock (https://github.com/dask/dask/issues/3832) but using a reentrant lock would be a nice clean-up.
- Currently I'm using the ""close after each write"" strategy with dask-distributed (https://github.com/dask/distributed/issues/2163). This works OK for netCDF4 and h5netcdf, but for the SciPy netCDF writer it's basically a non-starter, because SciPy only writes complete files (https://github.com/scipy/scipy/issues/9157) -- so I'm still having SciPy raise an error. It would be nice to also support the ""write complete files"" strategy, which could have significantly better performance at the cost of memory usage. We might need some new API for this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-409032803,https://api.github.com/repos/pydata/xarray/issues/2261,409032803,MDEyOklzc3VlQ29tbWVudDQwOTAzMjgwMw==,1217238,2018-07-30T22:29:30Z,2018-07-30T22:29:30Z,MEMBER,"I think it's a matter of missing some of the required locks and/or not
syncing files before pickling FileManager object. I'm currently working
through the locking logic again...

On Mon, Jul 30, 2018 at 2:13 PM Joe Hamman <notifications@github.com> wrote:

> Note that this isn't quite working for Dask distributed yet.
>
> Any ideas of what is not working yet? I spent a fair bit of time wrestling
> with the distributed write problem earlier this year so can perhaps be of
> help here.
> ------------------------------
>
> Also cc @pwolfram <https://github.com/pwolfram> who was an early
> interested party in this LRU cache idea.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/2261#issuecomment-409012504>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1uGiLR6QkFcYyPd0CQSPnCuhrfgPks5uL3cCgaJpZM4U-NzI>
> .
>
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-408894526,https://api.github.com/repos/pydata/xarray/issues/2261,408894526,MDEyOklzc3VlQ29tbWVudDQwODg5NDUyNg==,1217238,2018-07-30T15:01:10Z,2018-07-30T15:01:10Z,MEMBER,"Note that this isn't quite working for Dask distributed yet.
On Mon, Jul 30, 2018 at 4:19 AM Fabien Maussion <notifications@github.com>
wrote:

> This is great! I like it, this simplifies the internals a lot.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/2261#issuecomment-408830242>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1oqDDX3xRklGI1S5LLl4E5F_QMLeks5uLuvBgaJpZM4U-NzI>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-408714992,https://api.github.com/repos/pydata/xarray/issues/2261,408714992,MDEyOklzc3VlQ29tbWVudDQwODcxNDk5Mg==,1217238,2018-07-29T23:53:31Z,2018-07-29T23:53:31Z,MEMBER,"I finished porting this to the other backends and have now officially deprecated the `autoclose` option.

I'm tentatively marking this for the 0.11 release, since there's a decent chance that this will cause some breakage (we will definitely want to test this on some real work-loads before the release).

It's been about 9 months since the 0.10 release, so this is probably a good time to make another major release, anyways.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-408642640,https://api.github.com/repos/pydata/xarray/issues/2261,408642640,MDEyOklzc3VlQ29tbWVudDQwODY0MjY0MA==,1217238,2018-07-29T00:03:57Z,2018-07-29T00:03:57Z,MEMBER,"As an experiment, I rewrote the SciPy netCDF backend to use FileManager:
- The code is now significantly simpler -- all the ensure_open() business could simply be removed.
- We used to see a bunch of warnings about not closing memory mapped files (""RuntimeWarning: Cannot close a netcdf_file opened with mmap=True, when netcdf_variables or arrays referring to its data still exist.""). These have all gone away!
- `compute=False` now magically works (I only had to remove the explicitly raised error!)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-404216747,https://api.github.com/repos/pydata/xarray/issues/2261,404216747,MDEyOklzc3VlQ29tbWVudDQwNDIxNjc0Nw==,1217238,2018-07-11T15:42:53Z,2018-07-11T15:42:53Z,MEMBER,"OK, this is ready for review.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315
https://github.com/pydata/xarray/pull/2261#issuecomment-403543994,https://api.github.com/repos/pydata/xarray/issues/2261,403543994,MDEyOklzc3VlQ29tbWVudDQwMzU0Mzk5NA==,1217238,2018-07-09T16:46:05Z,2018-07-09T16:46:05Z,MEMBER,"@jhamman thanks for taking a look. I'm going to push another iteration of this shortly (OK, a major rewrite) where there is only a single FileManager object which uses an LRU cache.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337267315