html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1301#issuecomment-344437569,https://api.github.com/repos/pydata/xarray/issues/1301,344437569,MDEyOklzc3VlQ29tbWVudDM0NDQzNzU2OQ==,2443309,2017-11-14T23:41:57Z,2017-11-14T23:41:57Z,MEMBER,"@friedrichknuth, any chance you can take a look at this with the latest v0.10 release candidate?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-291516997,https://api.github.com/repos/pydata/xarray/issues/1301,291516997,MDEyOklzc3VlQ29tbWVudDI5MTUxNjk5Nw==,1197350,2017-04-04T14:27:18Z,2017-04-04T14:27:18Z,MEMBER,"My understanding is that you are concatenating across the variable `obs`, so no, it wouldn't make sense to have `obs` be the same in all the datasets.

My tests showed that it's not necessarily the concat step that is slowing this down. Your profiling suggest that it's a netcdf datetime decoding issue.

I wonder if @shoyer or @jhamman have any ideas about how to improve performance here.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-286220317,https://api.github.com/repos/pydata/xarray/issues/1301,286220317,MDEyOklzc3VlQ29tbWVudDI4NjIyMDMxNw==,1197350,2017-03-13T19:40:50Z,2017-03-13T19:40:50Z,MEMBER,"And the length of `obs` is different in each dataset.
```python
>>> for myds in dsets:
            print(myds.dims)
Frozen(SortedKeysDict({u'obs': 7537613}))
Frozen(SortedKeysDict({u'obs': 7247697}))
Frozen(SortedKeysDict({u'obs': 7497680}))
Frozen(SortedKeysDict({u'obs': 7661468}))
Frozen(SortedKeysDict({u'obs': 5750197}))
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-286219858,https://api.github.com/repos/pydata/xarray/issues/1301,286219858,MDEyOklzc3VlQ29tbWVudDI4NjIxOTg1OA==,1197350,2017-03-13T19:39:15Z,2017-03-13T19:39:15Z,MEMBER,"There is definitely something funky with these datasets that is causing xarray to go very slow.

This is fast:
```python
>>> %time dsets = [xr.open_dataset(fname) for fname in glob('*.nc')]
CPU times: user 1.1 s, sys: 664 ms, total: 1.76 s
Wall time: 1.78 s
```

But even just trying to print the repr is slow
```python
>>> %time print(dsets[0])
CPU times: user 3.66 s, sys: 3.49 s, total: 7.15 s
Wall time: 7.28 s
```

Maybe some of this has to do with the change at 0.9.0 to allowing index-less dimensions (i.e. coordinates are optional). All of these datasets have such a dimension, e.g.
```
<xarray.Dataset>
Dimensions:                                     (obs: 7247697)
Coordinates:
    lon                                         (obs) float64 -124.3 -124.3 ...
    lat                                         (obs) float64 44.64 44.64 ...
    time                                        (obs) datetime64[ns] 2014-11-10T00:00:00.011253 ...
Dimensions without coordinates: obs
Data variables:
    oxy_calphase                                (obs) float64 3.293e+04 ...
    quality_flag                                (obs) |S2 'ok' 'ok' 'ok' ...
    ctdbp_no_seawater_conductivity_qc_executed  (obs) uint8 29 29 29 29 29 ...
...
``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-285149350,https://api.github.com/repos/pydata/xarray/issues/1301,285149350,MDEyOklzc3VlQ29tbWVudDI4NTE0OTM1MA==,1197350,2017-03-08T19:52:11Z,2017-03-08T19:52:11Z,MEMBER,"I just tried this on a few different datasets. Comparing python 2.7, xarray 0.7.2, dask 0.7.1 (an old environment I had on hand) with python 2.7, xarray 0.9.1-28-g1cad803, dask 0.13.0 (my current ""production"" environment), I could not reproduce. The up-to-date stack was faster by a factor of < 2. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-285110824,https://api.github.com/repos/pydata/xarray/issues/1301,285110824,MDEyOklzc3VlQ29tbWVudDI4NTExMDgyNA==,1217238,2017-03-08T17:35:49Z,2017-03-08T17:35:49Z,MEMBER,"> One thing that helps get a better profile is setting dask backend to the non-parallel sync option which gives cleaner profiles.

Indeed, this is highly recommended, see http://dask.pydata.org/en/latest/faq.html","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-284915063,https://api.github.com/repos/pydata/xarray/issues/1301,284915063,MDEyOklzc3VlQ29tbWVudDI4NDkxNTA2Mw==,1217238,2017-03-08T01:16:58Z,2017-03-08T01:16:58Z,MEMBER,Hmm. It might be interesting to try `lock=threading.Lock()` to revert to the old version of the thread lock as well.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-284914442,https://api.github.com/repos/pydata/xarray/issues/1301,284914442,MDEyOklzc3VlQ29tbWVudDI4NDkxNDQ0Mg==,2443309,2017-03-08T01:13:35Z,2017-03-08T01:13:35Z,MEMBER,"This is what I'm seeing for my `%prun` profiling: 

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      204   19.783    0.097   19.783    0.097 {method 'acquire' of '_thread.lock' objects}
89208/51003    2.524    0.000    5.553    0.000 indexing.py:361(shape)
        1    1.359    1.359   37.876   37.876 <string>:1(<module>)
71379/53550    1.242    0.000    3.266    0.000 utils.py:412(shape)
   538295    0.929    0.000    1.317    0.000 {built-in method builtins.isinstance}
24674/13920    0.836    0.000    4.139    0.000 _collections_abc.py:756(update)
        9    0.788    0.088    0.803    0.089 netCDF4_.py:178(_open_netcdf4_group)```

Weren't there some recent changes to the thread lock related to dask distributed?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-284908153,https://api.github.com/repos/pydata/xarray/issues/1301,284908153,MDEyOklzc3VlQ29tbWVudDI4NDkwODE1Mw==,1217238,2017-03-08T00:38:55Z,2017-03-08T00:38:55Z,MEMBER,"Wow, that is pretty bad.

Try setting `compat='broadcast_equals'` in the `open_mfdataset` call, to restore the default value of that parameter prior v0.9.

If that doesn't help, try downgrading dask to see if it's responsible. Profiling results from `%prun` in IPython would also be helpful at tracking down the culprit.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-284905152,https://api.github.com/repos/pydata/xarray/issues/1301,284905152,MDEyOklzc3VlQ29tbWVudDI4NDkwNTE1Mg==,2443309,2017-03-08T00:22:10Z,2017-03-08T00:22:10Z,MEMBER,"I've also noticed that we have a bottleneck here.

@shoyer - any idea what we changed that could impact this? Could this be coming from a change upstream in dask?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278