home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1924497392

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1924497392 I_kwDOAMm_X85ytX_w 8269 open_dataset with engine='zarr' changed from '2023.8.0' to '2023.9.0' 6819509 closed 0     4 2023-10-03T16:19:54Z 2023-10-18T16:50:20Z 2023-10-18T16:50:20Z NONE      

What is your issue?

When moving from xarray version '2023.8.0' to '2023.9.0' the behavior of importing a zarr changed for me (code to create example zarr at end of this post). When importing a variable with units "days accumulated", the values are scaled differently between the two versions. The latest version seems to automatically treat this as as time-like array (I think the -9.223372e+18 seen are NaT-like?).

Open the zarr: python import xarray as xr ds = xr.open_dataset('debug.zarr', engine='zarr', chunks={})

Print as a pandas-like table for each version of xarray for readability: python ds.to_dataframe()

Version '2023.8.0': |time|dapr (dtype=float32)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|NaN|NaN| |2000-01-02|NaN|NaN| |2000-01-03|2.0|1.5|

Version '2023.9.0': |time|dapr (dtype=float64)|mdpr (dtype=float32)| |---|---|---| |2000-01-01|-9.223372e+18|NaN| |2000-01-02|-9.223372e+18|NaN| |2000-01-03|2.000000e+00|1.5|

I can manually disable this by using the "use_cf=False", "mask_and_scale=False", and then manually scale this variable, though that is not ideal. The "decode_timedelta" doesn't seem to have an effect on this data, either.

I understand the "days" keyword is in my units, however the full unit is "days accumulated". Has the behavior of xarray changed to find keywords such as "days" occurring anywhere in the units (eg. as a substring)? Do you have any other suggestions? Thank you for the help.

Code to create the debug.zarr for the tables above:

```python import numpy as np import pandas as pd import xarray as xr import zarr

Create some multiday precipitation data (similar to https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)

mdpr is the amount of a multiday total (inches)

dapr is the number of days each multiday total occurred over (days accumulated).

In this example, 1.50 inches of rain fell over 2 days (2 observation periods), ending on 2000-01-03

I use float32 to represent these, but pack these as int16 values in the zarr.

mdpr = np.array([np.NaN, np.NaN, 1.50], dtype=np.float32) dapr = np.array([np.NaN, np.NaN , 2.0], dtype=np.float32) time = pd.date_range('2000-01-01', periods=3)

Create a dataset from these values

ds = xr.Dataset( data_vars=dict( mdpr=(['time'], mdpr), dapr=(['time'], dapr), ), coords=dict( time=time, ), attrs=dict(description='multiday precipitation data'), )

Specify encoding to pack these float32 values as int16

encoding = { 'mdpr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 0.01, 'add_offset': 0.0, 'dtype': np.int16, }, 'dapr' : { 'chunks' : (3,), 'compressor': zarr.Blosc(cname='zstd', clevel=3, shuffle=1), 'filters': None, 'missing_value': -32768, '_FillValue': -32768, 'scale_factor': 1.0, 'add_offset': 0.0, 'dtype': np.int16, }, }

Create attributes. The "units" for the dapr variable seems to be the issue "days" in the

"days accumulated"

ds.mdpr.attrs['units'] = 'inches' ds.mdpr.attrs['description'] = 'multiday precip amount'

ds.dapr.attrs['units'] = 'days accumulated' ds.dapr.attrs['description'] = 'number of days included in the multiday precipitation'

Save to zarr

ds.to_zarr('debug.zarr', mode='w', encoding=encoding) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8269/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 4.055ms · About: xarray-datasette