html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1385#issuecomment-439454213,https://api.github.com/repos/pydata/xarray/issues/1385,439454213,MDEyOklzc3VlQ29tbWVudDQzOTQ1NDIxMw==,1217238,2018-11-16T16:46:55Z,2018-11-16T16:46:55Z,MEMBER,"Does it take 10 seconds even to open a single file? The big mystery is what
that top line (""_operator.getitem"") is but my guess is it's netCDF4-python.
h5netcdf might also give different results...
On Fri, Nov 16, 2018 at 8:20 AM chuaxr <notifications@github.com> wrote:

> Sorry, I think the speedup had to do with accessing a file that had
> previously been loaded rather than due to decode_cf. Here's the output of
> prun using two different files of approximately the same size (~75 GB),
> run from a notebook without using distributed (which doesn't lead to any
> speedup):
>
> Output of
> %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/
> atmos_level.1999010100-2000123123.sphum.nc
> ',chunks={'lat':20,'time':50,'lon':12,'pfull':11})
>
>
>           780980 function calls (780741 primitive calls) in 55.374 seconds
>
>     Ordered by: internal time
>
>     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>          7   54.448    7.778   54.448    7.778 {built-in method _operator.getitem}
>     764838    0.473    0.000    0.473    0.000 core.py:169(<genexpr>)
>          3    0.285    0.095    0.758    0.253 core.py:169(<listcomp>)
>          2    0.041    0.020    0.041    0.020 {cftime._cftime.num2date}
>          3    0.040    0.013    0.821    0.274 core.py:173(getem)
>          1    0.027    0.027   55.374   55.374 <string>:1(<module>)
>
>
>
> Output of
> %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/
> atmos_level.2001010100-2002123123.temp.nc
> ',chunks={'lat':20,'time':50,'lon':12,'pfull':11},
> decode_cf=False)
>
>
>           772212 function calls (772026 primitive calls) in 56.000 seconds
>
>     Ordered by: internal time
>
>     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>          5   55.213   11.043   55.214   11.043 {built-in method _operator.getitem}
>     764838    0.486    0.000    0.486    0.000 core.py:169(<genexpr>)
>          3    0.185    0.062    0.671    0.224 core.py:169(<listcomp>)
>          3    0.041    0.014    0.735    0.245 core.py:173(getem)
>          1    0.027    0.027   56.001   56.001 <string>:1(<module>)
>
> /work isn't a remote archive, so it surprises me that this should happen.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1385#issuecomment-439445695>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1jmFqfe9_dIgHAMYlVOh7WKhzO8Kks5uvuXKgaJpZM4NJOcQ>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-439263419,https://api.github.com/repos/pydata/xarray/issues/1385,439263419,MDEyOklzc3VlQ29tbWVudDQzOTI2MzQxOQ==,1217238,2018-11-16T02:45:05Z,2018-11-16T02:45:05Z,MEMBER,"@chuaxr What do you see when you use `%prun` when opening the dataset? This might point to the bottleneck.

One way to fix this would be to move our call to `decode_cf()` in `open_dataset()` to after applying chunking, i.e., to switch up the order of operations on these lines:
https://github.com/pydata/xarray/blob/f547ed0b379ef70a3bda5e77f66de95ec2332ddf/xarray/backends/api.py#L270-L296

In practice, is the difference between using xarray's internal lazy array classes for decoding and dask for decoding. I would expect to see small differences in performance between these approaches (especially when actually computing data), but for constructing the computation graph I would expect them to have similar performance. It is puzzling that dask is orders of magnitude faster -- that suggests that something else is going wrong in the normal code path for `decode_cf()`. It would certainly be good to understand this before trying to apply any fixes.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-438873285,https://api.github.com/repos/pydata/xarray/issues/1385,438873285,MDEyOklzc3VlQ29tbWVudDQzODg3MzI4NQ==,1217238,2018-11-15T00:45:53Z,2018-11-15T00:45:53Z,MEMBER,"@chuaxr I assume you're testing this with xarray 0.11?

It would be good to do some profiling to figure out what is going wrong here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-437630511,https://api.github.com/repos/pydata/xarray/issues/1385,437630511,MDEyOklzc3VlQ29tbWVudDQzNzYzMDUxMQ==,1217238,2018-11-10T23:38:10Z,2018-11-10T23:38:10Z,MEMBER,Was this fixed by https://github.com/pydata/xarray/pull/2047?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-371933603,https://api.github.com/repos/pydata/xarray/issues/1385,371933603,MDEyOklzc3VlQ29tbWVudDM3MTkzMzYwMw==,1217238,2018-03-09T20:17:19Z,2018-03-09T20:17:19Z,MEMBER,"OK, so it seems that we need a change to disable wrapping dask arrays with `LazilyIndexedArray`. Dask arrays are already lazy!","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-370092011,https://api.github.com/repos/pydata/xarray/issues/1385,370092011,MDEyOklzc3VlQ29tbWVudDM3MDA5MjAxMQ==,1217238,2018-03-02T23:58:26Z,2018-03-02T23:58:26Z,MEMBER,@rabernat How does performance compare if you call `xarray.decode_cf()` on the opened dataset? The adjustments I recently did to lazy decoding should only help once the data is already loaded into dask.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-297539517,https://api.github.com/repos/pydata/xarray/issues/1385,297539517,MDEyOklzc3VlQ29tbWVudDI5NzUzOTUxNw==,1217238,2017-04-26T20:59:23Z,2017-04-26T20:59:23Z,MEMBER,"> For example, can I give a hint to xarray that this reindex_variables step is not necessary

Yes, adding an boolean argument `prealigned` which defaults to `False` to `concat` seems like a very reasonable optimization here.

But more generally, I am a little surprised by how slow `pandas.Index.get_indexer` and `pandas.Index.is_unique` are. This suggests we should add a fast-path optimization to skip these steps in `reindex_variables`:
https://github.com/pydata/xarray/blob/ab4ffee919d4abe9f6c0cf6399a5827c38b9eb5d/xarray/core/alignment.py#L302-L306

Basically, if `index.equals(target)`, we should just set `indexer = np.arange(target.size)`. Although, if we have duplicate values in the index, the operation should arguably fail for correctness.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135