home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 644442679

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1077#issuecomment-644442679 https://api.github.com/repos/pydata/xarray/issues/1077 644442679 MDEyOklzc3VlQ29tbWVudDY0NDQ0MjY3OQ== 2448579 2020-06-15T23:29:11Z 2020-06-15T23:38:30Z MEMBER

This seems to be possible following http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#compression-by-gathering

Here is a quick proof of concept:

``` python import numpy as np import pandas as pd import xarray as xr

example 1

ds = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_product( [["a", "b"], [1, 2]], names=("lat", "lon") ) }, )

example 2

ds = xr.Dataset(

{"landsoilt": ("landpoint", np.random.randn(4))},

{

"landpoint": pd.MultiIndex.from_arrays(

[["a", "b", "c", "d"], [1, 2, 4, 10]], names=("lat", "lon")

)

},

)

encode step

detect using isinstance(index, pd.MultiIndex)

idxname = "landpoint" encoded = ds.reset_index(idxname) coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels)) for coord in coords: encoded[coord] = coords[coord].values shape = [encoded.sizes[coord] for coord in coords] encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape) encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names)

decode step

detect using "compress" in var.attrs

idxname = "landpoint"
names = encoded[idxname].attrs["compress"].split(" ") shape = [encoded.sizes[dim] for dim in names] indices = np.unravel_index(encoded.landpoint.values, shape) arrays = [encoded[dim].values[index] for dim, index in zip(names, indices)] mindex = pd.MultiIndex.from_arrays(arrays)

decoded = xr.Dataset({}, {idxname: mindex}) decoded["landsoilt"] = (idxname, encoded["landsoilt"].values)

xr.testing.assert_identical(decoded, ds)

```

encoded can be serialized using our existing code: <xarray.Dataset> Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 'a' 'b' * lon (lon) int64 1 2 * landpoint (landpoint) int64 0 1 2 3 Data variables: landsoilt (landpoint) float64 -1.668 -1.003 1.084 1.963

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187069161
Powered by Datasette · Queries took 0.459ms · About: xarray-datasette