html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1372#issuecomment-338019155,https://api.github.com/repos/pydata/xarray/issues/1372,338019155,MDEyOklzc3VlQ29tbWVudDMzODAxOTE1NQ==,1197350,2017-10-19T19:56:12Z,2017-10-19T20:06:53Z,MEMBER,"I just hit this issue. I tried to reproduce it with a synthetic dataset, as in @shoyer's example, but I couldn't. I can only reproduce it with data loaded from netcdf4 via open_mfdataset.
I downloaded one year of air-temperature data from NARR:
ftp://ftp.cdc.noaa.gov/Datasets/NARR/Dailies/pressure/
I load it this way (preprocessing is necessary to resolve conflict between `missing_value` and `_FillValue`):
```python
def preprocess_narr(ds):
del ds.air.attrs['_FillValue']
ds = ds.set_coords(['lon', 'lat', 'Lambert_Conformal', 'time_bnds'])
return ds
ds = xr.open_mfdataset('air.*.nc', preprocess=preprocess_narr, decode_cf=False)
print(ds)
print(ds.chunks)
```
```
Dimensions: (level: 29, nbnds: 2, time: 365, x: 349, y: 277)
Coordinates:
* level (level) float32 1000.0 975.0 950.0 925.0 900.0 875.0 ...
lat (y, x) float32 1.0 1.10431 1.20829 1.31196 1.4153 ...
lon (y, x) float32 -145.5 -145.315 -145.13 -144.943 ...
* y (y) float32 0.0 32463.0 64926.0 97389.0 129852.0 ...
* x (x) float32 0.0 32463.0 64926.0 97389.0 129852.0 ...
Lambert_Conformal int32 -2147483647
* time (time) float64 1.666e+06 1.666e+06 1.666e+06 ...
time_bnds (time, nbnds) float64 1.666e+06 1.666e+06 1.666e+06 ...
Dimensions without coordinates: nbnds
Data variables:
air (time, level, y, x) float32 297.475 297.463 297.453 ...
Frozen(SortedKeysDict({'y': (277,), 'x': (349,), 'time': (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31), 'level': (29,), 'nbnds': (2,)}))
```
If I try to decode cf, it returns instantly.
```python
ds_decode = xr.decode_cf(ds)
print(ds_decode)
```
```
Dimensions: (level: 29, nbnds: 2, time: 365, x: 349, y: 277)
Coordinates:
* level (level) float32 1000.0 975.0 950.0 925.0 900.0 875.0 ...
lat (y, x) float32 1.0 1.10431 1.20829 1.31196 1.4153 ...
lon (y, x) float32 -145.5 -145.315 -145.13 -144.943 ...
* y (y) float32 0.0 32463.0 64926.0 97389.0 129852.0 ...
* x (x) float32 0.0 32463.0 64926.0 97389.0 129852.0 ...
Lambert_Conformal int32 -2147483647
* time (time) datetime64[ns] 1990-01-01 1990-01-02 ...
time_bnds (time, nbnds) float64 1.666e+06 1.666e+06 1.666e+06 ...
Dimensions without coordinates: nbnds
Data variables:
air (time, level, y, x) float64 297.5 297.5 297.5 297.4 ...
```
There are no more chunks: `ds_decode.air.chunks is None` **but** `ds_decode.air._in_memory is False`.
This is already a weird situation, since the data is not in memory, but it is not a dask array either.
If I try to do anything beyond this with the data, it triggers eager computation. Even if I just call `type(ds.air.data)` it computes. If I do any arithmetic, it computes.
In my case, I could get around this problem if the preprocess function in `open_mfdataset` were applied _before_ the cf_decoding of the store. But in general, we really need to make `decode_cf` fully dask compatible.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,221387277