home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 644803374

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1077#issuecomment-644803374 https://api.github.com/repos/pydata/xarray/issues/1077 644803374 MDEyOklzc3VlQ29tbWVudDY0NDgwMzM3NA== 2448579 2020-06-16T14:31:23Z 2020-06-16T14:31:23Z MEMBER

I may be missing something but @fujiisoup's concern is addressed by the scheme in the CF conventions.

In your encoded, how can we tell the MultiIndex is [('a', 1), ('b', 1), ('a', 2), ('b', 2)] or [('a', 1), ('a', 2), ('b', 1), ('b', 2)]?

The information about ordering is stored as 1D indexes of an ND array; constructed using np.ravel_multi_index in the encode_multiindex function:

encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape)

For example, see the dimension coordinate landpoint in the encoded form ```

ds3 <xarray.Dataset> Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) MultiIndex - lat (landpoint) object 'a' 'b' 'b' 'a' - lon (landpoint) int64 1 2 1 2 Data variables: landsoilt (landpoint) float64 -0.2699 -1.228 0.4632 0.2287 encode_multiindex(ds3, "landpoint") <xarray.Dataset> Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 'a' 'b' * lon (lon) int64 1 2 * landpoint (landpoint) int64 0 3 2 1 Data variables: landsoilt (landpoint) float64 -0.2699 -1.228 0.4632 0.2287 ```

Here is a cleaned up version of the code for easy testing ``` python import numpy as np import pandas as pd import xarray as xr

def encode_multiindex(ds, idxname): encoded = ds.reset_index(idxname) coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels)) for coord in coords: encoded[coord] = coords[coord].values shape = [encoded.sizes[coord] for coord in coords] encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape) encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names) return encoded

def decode_to_multiindex(encoded, idxname): names = encoded[idxname].attrs["compress"].split(" ") shape = [encoded.sizes[dim] for dim in names] indices = np.unravel_index(encoded.landpoint.values, shape) arrays = [encoded[dim].values[index] for dim, index in zip(names, indices)] mindex = pd.MultiIndex.from_arrays(arrays)

decoded = xr.Dataset({}, {idxname: mindex})
for varname in encoded.data_vars:
    if idxname in encoded[varname].dims:
        decoded[varname] = (idxname, encoded[varname].values)
return decoded

ds1 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_product( [["a", "b"], [1, 2]], names=("lat", "lon") ) }, )

ds2 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_arrays( [["a", "b", "c", "d"], [1, 2, 4, 10]], names=("lat", "lon") ) }, )

ds3 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_arrays( [["a", "b", "b", "a"], [1, 2, 1, 2]], names=("lat", "lon") ) }, )

idxname = "landpoint" for dataset in [ds1, ds2, ds3]: xr.testing.assert_identical( decode_to_multiindex(encode_multiindex(dataset, idxname), idxname), dataset ) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187069161
Powered by Datasette · Queries took 0.586ms · About: xarray-datasette