html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1301#issuecomment-344949160,https://api.github.com/repos/pydata/xarray/issues/1301,344949160,MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA==,10554254,2017-11-16T15:01:59Z,2017-11-16T15:02:48Z,NONE,"Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/ 

```
da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')
```

xarray==0.10.0rc2-1-g8267fdb
dask==0.15.4
```
         194381 function calls (188429 primitive calls) in 0.869 seconds

   Ordered by: internal time
   List reduced from 469 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       50    0.393    0.008    0.393    0.008 {numpy.core.multiarray.arange}
       50    0.164    0.003    0.557    0.011 indexing.py:266(_index_indexer_1d)
        5    0.083    0.017    0.085    0.017 netCDF4_.py:185(_open_netcdf4_group)
      190    0.024    0.000    0.066    0.000 netCDF4_.py:256(open_store_variable)
      190    0.022    0.000    0.022    0.000 netCDF4_.py:29(__init__)
       50    0.018    0.000    0.021    0.000 {operator.getitem}
5145/3605    0.012    0.000    0.019    0.000 indexing.py:493(shape)
2317/1291    0.009    0.000    0.094    0.000 _abcoll.py:548(update)
    26137    0.006    0.000    0.013    0.000 {isinstance}
      720    0.005    0.000    0.006    0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}

```
xarray==0.9.1
dask==0.13.0
```

         241253 function calls (229881 primitive calls) in 98.123 seconds

   Ordered by: internal time
   List reduced from 659 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       30   87.527    2.918   87.527    2.918 {pandas._libs.tslib.array_to_timedelta64}
       65    7.055    0.109    7.059    0.109 {operator.getitem}
       80    0.799    0.010    0.799    0.010 {numpy.core.multiarray.arange}
7895/4420    0.502    0.000    0.524    0.000 utils.py:412(shape)
       68    0.442    0.007    0.442    0.007 {pandas._libs.algos.ensure_object}
       80    0.350    0.004    1.150    0.014 indexing.py:318(_index_indexer_1d)
    60/30    0.296    0.005   88.407    2.947 timedeltas.py:158(_convert_listlike)
       30    0.284    0.009    0.298    0.010 algorithms.py:719(checked_add_with_arr)
      123    0.140    0.001    0.140    0.001 {method 'astype' of 'numpy.ndarray' objects}
 1049/719    0.096    0.000   96.513    0.134 {numpy.core.multiarray.array}
```","{""total_count"": 3, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-293619896,https://api.github.com/repos/pydata/xarray/issues/1301,293619896,MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng==,10554254,2017-04-12T15:42:18Z,2017-04-12T15:42:18Z,NONE,"decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-293593843,https://api.github.com/repos/pydata/xarray/issues/1301,293593843,MDEyOklzc3VlQ29tbWVudDI5MzU5Mzg0Mw==,865212,2017-04-12T14:24:44Z,2017-04-12T14:25:29Z,NONE,@friedrichknuth Did you try tests with the most recent version `decode_times`=True/False on a single file read?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-291512017,https://api.github.com/repos/pydata/xarray/issues/1301,291512017,MDEyOklzc3VlQ29tbWVudDI5MTUxMjAxNw==,1360241,2017-04-04T14:11:08Z,2017-04-04T14:11:08Z,NONE,@rabernat This data is computed on demand from the OOI (http://oceanobservatories.org/cyberinfrastructure-technology/). Datasets can be massive and so they seem to be split up in ~500 MB files when data gets too big. That is why obs changes for each file. Would having obs be consistent across all files potentially make open_mfdataset faster?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-286220522,https://api.github.com/repos/pydata/xarray/issues/1301,286220522,MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg==,10554254,2017-03-13T19:41:25Z,2017-03-13T19:41:25Z,NONE,"Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load.

xarray==0.9.1, dask==0.13.0

```
da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')

         167305 function calls (160352 primitive calls) in 59.688 seconds

   Ordered by: internal time
   List reduced from 625 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       18   57.057    3.170   57.057    3.170 {pandas.tslib.array_to_timedelta64}
       39    0.860    0.022    0.863    0.022 {operator.getitem}
       48    0.402    0.008    0.402    0.008 {numpy.core.multiarray.arange}
4341/2463    0.257    0.000    0.273    0.000 utils.py:412(shape)
       88    0.245    0.003    0.245    0.003 {pandas.algos.ensure_object}
       48    0.158    0.003    0.561    0.012 indexing.py:318(_index_indexer_1d)
    36/18    0.135    0.004   57.509    3.195 timedeltas.py:150(_convert_listlike)
       18    0.126    0.007    0.130    0.007 nanops.py:815(_checked_add_with_arr)
       51    0.070    0.001    0.070    0.001 {method 'astype' of 'numpy.ndarray' objects}
  676/475    0.047    0.000   58.853    0.124 {numpy.core.multiarray.array}
```
`pandas.tslib.array_to_timedelta64` appears to be the most expensive item on the list, and isn't being run when using xarray 0.8.2.

xarray==0.8.2, dask==0.13.0

```
da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')


         140668 function calls (136769 primitive calls) in 0.766 seconds

   Ordered by: internal time
   List reduced from 621 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2571/1800    0.178    0.000    0.184    0.000 utils.py:387(shape)
       18    0.174    0.010    0.174    0.010 {numpy.core.multiarray.arange}
       16    0.079    0.005    0.079    0.005 {numpy.core.multiarray.concatenate}
  483/420    0.077    0.000    0.125    0.000 {numpy.core.multiarray.array}
       15    0.054    0.004    0.197    0.013 indexing.py:259(_index_indexer_1d)
        3    0.041    0.014    0.043    0.014 netCDF4_.py:181(__init__)
      105    0.013    0.000    0.057    0.001 netCDF4_.py:196(open_store_variable)
       15    0.012    0.001    0.013    0.001 {operator.getitem}
2715/1665    0.007    0.000    0.178    0.000 indexing.py:343(shape)
     5971    0.006    0.000    0.006    0.000 collections.py:71(__setitem__)
```
The version of dask is held constant in each test.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-286212647,https://api.github.com/repos/pydata/xarray/issues/1301,286212647,MDEyOklzc3VlQ29tbWVudDI4NjIxMjY0Nw==,1360241,2017-03-13T19:12:13Z,2017-03-13T19:12:13Z,NONE,"Data: Five files that are approximately 450 MB each.

venv1
dask                      0.13.0                   py27_0    conda-forge
xarray                    0.8.2                    py27_0    conda-forge
1.51642394066 seconds to load using open_mfdataset

venv2:
dask                      0.13.0                   py27_0    conda-forge
xarray                    0.9.1                    py27_0    conda-forge
279.011202097 seconds to load using open_mfdataset

I ran the same code in the OP on two conda envs with the same version of dask but two different versions of xarray. There was a significant difference in load time between the two conda envs.

I've posted the data on my work site if anyone wants to double check: https://marine.rutgers.edu/~michaesm/netcdf/data/


","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278