home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 338019155

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1372#issuecomment-338019155 https://api.github.com/repos/pydata/xarray/issues/1372 338019155 MDEyOklzc3VlQ29tbWVudDMzODAxOTE1NQ== 1197350 2017-10-19T19:56:12Z 2017-10-19T20:06:53Z MEMBER

I just hit this issue. I tried to reproduce it with a synthetic dataset, as in @shoyer's example, but I couldn't. I can only reproduce it with data loaded from netcdf4 via open_mfdataset.

I downloaded one year of air-temperature data from NARR: ftp://ftp.cdc.noaa.gov/Datasets/NARR/Dailies/pressure/

I load it this way (preprocessing is necessary to resolve conflict between missing_value and _FillValue): python def preprocess_narr(ds): del ds.air.attrs['_FillValue'] ds = ds.set_coords(['lon', 'lat', 'Lambert_Conformal', 'time_bnds']) return ds ds = xr.open_mfdataset('air.*.nc', preprocess=preprocess_narr, decode_cf=False) print(ds) print(ds.chunks) <xarray.Dataset> Dimensions: (level: 29, nbnds: 2, time: 365, x: 349, y: 277) Coordinates: * level (level) float32 1000.0 975.0 950.0 925.0 900.0 875.0 ... lat (y, x) float32 1.0 1.10431 1.20829 1.31196 1.4153 ... lon (y, x) float32 -145.5 -145.315 -145.13 -144.943 ... * y (y) float32 0.0 32463.0 64926.0 97389.0 129852.0 ... * x (x) float32 0.0 32463.0 64926.0 97389.0 129852.0 ... Lambert_Conformal int32 -2147483647 * time (time) float64 1.666e+06 1.666e+06 1.666e+06 ... time_bnds (time, nbnds) float64 1.666e+06 1.666e+06 1.666e+06 ... Dimensions without coordinates: nbnds Data variables: air (time, level, y, x) float32 297.475 297.463 297.453 ... Frozen(SortedKeysDict({'y': (277,), 'x': (349,), 'time': (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31), 'level': (29,), 'nbnds': (2,)}))

If I try to decode cf, it returns instantly. python ds_decode = xr.decode_cf(ds) print(ds_decode) <xarray.Dataset> Dimensions: (level: 29, nbnds: 2, time: 365, x: 349, y: 277) Coordinates: * level (level) float32 1000.0 975.0 950.0 925.0 900.0 875.0 ... lat (y, x) float32 1.0 1.10431 1.20829 1.31196 1.4153 ... lon (y, x) float32 -145.5 -145.315 -145.13 -144.943 ... * y (y) float32 0.0 32463.0 64926.0 97389.0 129852.0 ... * x (x) float32 0.0 32463.0 64926.0 97389.0 129852.0 ... Lambert_Conformal int32 -2147483647 * time (time) datetime64[ns] 1990-01-01 1990-01-02 ... time_bnds (time, nbnds) float64 1.666e+06 1.666e+06 1.666e+06 ... Dimensions without coordinates: nbnds Data variables: air (time, level, y, x) float64 297.5 297.5 297.5 297.4 ... There are no more chunks: ds_decode.air.chunks is None but ds_decode.air._in_memory is False.

This is already a weird situation, since the data is not in memory, but it is not a dask array either.

If I try to do anything beyond this with the data, it triggers eager computation. Even if I just call type(ds.air.data) it computes. If I do any arithmetic, it computes.

In my case, I could get around this problem if the preprocess function in open_mfdataset were applied before the cf_decoding of the store. But in general, we really need to make decode_cf fully dask compatible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  221387277
Powered by Datasette · Queries took 0.87ms · About: xarray-datasette