home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2171912634

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2171912634 PR_kwDOAMm_X85o3Ify 8809 Pass variable name to `encode_zarr_variable` 39069044 closed 0     6 2024-03-06T16:21:53Z 2024-04-03T14:26:49Z 2024-04-03T14:26:48Z CONTRIBUTOR   0 pydata/xarray/pulls/8809
  • [x] Closes https://github.com/xarray-contrib/xeofs/issues/148
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

The change from https://github.com/pydata/xarray/pull/8672 mostly fixed the issue of serializing a reset multiindex in the backends, but there was an additional niche issue that turned up in xeofs that was causing serialization to still fail on the zarr backend.

The issue is that zarr is the only backend that uses a custom version of encode_cf_variable called encode_zarr_variable, and the way this gets called we don't pass through the name of the variable before running ensure_not_multiindex.

As a minimal fix, this PR just passes name through as an additional arg to the general encode_variable function. See @benbovy's comment that maybe we should actually unwrap the level coordinate in reset_index and clean up the checks in ensure_not_multiindex, but I wasn't able to get that working easily.

The exact workflow this turned up in involves DataTree and looks like this: ```python import numpy as np import xarray as xr from datatree import DataTree

ND DataArray that gets stacked along a multiindex

da = xr.DataArray(np.ones((3, 3)), coords={"dim1": [1, 2, 3], "dim2": [4, 5, 6]}) da = da.stack(feature=["dim1", "dim2"])

Extract just the stacked coordinates for saving in a dataset

ds = xr.Dataset(data_vars={"feature": da.feature})

Reset the multiindex, which should make things serializable

ds = ds.reset_index("feature") dt1 = DataTree() dt2 = DataTree(name="feature", data=ds) dt1["foo"] = dt2

Somehow in this step, dt1.foo.feature.dim1.variable becomes an IndexVariable again

print(type(dt1.foo.feature.dim1.variable))

Works

dt1.to_netcdf("test.nc", mode="w")

Fails

dt1.to_zarr("test.zarr", mode="w") ```

But we can reproduce in xarray with the test added here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8809/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 1.177ms · About: xarray-datasette