issues: 1924497392
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1924497392 | I_kwDOAMm_X85ytX_w | 8269 | open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0' | 6819509 | closed | 0 | 4 | 2023-10-03T16:19:54Z | 2023-10-18T16:50:20Z | 2023-10-18T16:50:20Z | NONE | What is your issue?When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units "days accumulated", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?). Open the zarr:
Print as a pandas-like table for each version of xarray for readability:
Version '2023.8.0': |time|dapr (dtype=float32)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|NaN|NaN| |2000-01-02|NaN|NaN| |2000-01-03|2.0|1.5| Version '2023.9.0': |time|dapr (dtype=float64)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|-9.223372e+18|NaN| |2000-01-02|-9.223372e+18|NaN| |2000-01-03|2.000000e+00|1.5| I can manually disable this by using the "use_cf=False", "mask_and_scale=False", and then manually scale this variable, though that is not ideal. The "decode_timedelta" doesn't seem to have an effect on this data, either. I understand the "days" keyword is in my units, however the full unit is "days accumulated". Has the behavior of xarray changed to find keywords such as "days" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help. Code to create the debug.zarr for the tables above:```python import numpy as np import pandas as pd import xarray as xr import zarr Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)mdpr is the amount of a multiday total (inches)dapr is the number of days each multiday total occurred over (days accumulated).In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03I use float32 to represent these, but pack these as int16 values in the zarr.mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32) dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32) time = pd.date_range('2000-01-01', periods=3) Create a dataset from these valuesds = xr.Dataset( data_vars=dict( mdpr=(['time'], mdpr), dapr=(['time'], dapr), ), coords=dict( time=time, ), attrs=dict(description='multiday precipitation data'), ) Specify encoding to pack these float32 values as int16encoding = { 'mdpr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 0.01, 'add_offset': 0.0, 'dtype': np.int16, }, 'dapr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 1.0, 'add_offset': 0.0, 'dtype': np.int16, }, } Create attributes. The "units" for the dapr variable seems to be the issue "days" in the"days accumulated"ds.mdpr.attrs['units'] = 'inches' ds.mdpr.attrs['description'] = 'multiday precip amount' ds.dapr.attrs['units'] = 'days accumulated' ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation' Save to zarrds.to_zarr('debug.zarr', mode='w', encoding=encoding) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8269/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |