home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 187069161 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • dcherian · 5 ✖

issue 1

  • MultiIndex serialization to NetCDF · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1270514913 https://github.com/pydata/xarray/issues/1077#issuecomment-1270514913 https://api.github.com/repos/pydata/xarray/issues/1077 IC_kwDOAMm_X85LuoTh dcherian 2448579 2022-10-06T18:31:51Z 2022-10-06T18:31:51Z MEMBER

Thanks @lucianopaz I fixed some errors when I added it to cf-xarray It would be good to see if that version works for you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
1101505074 https://github.com/pydata/xarray/issues/1077#issuecomment-1101505074 https://api.github.com/repos/pydata/xarray/issues/1077 IC_kwDOAMm_X85Bp6Iy dcherian 2448579 2022-04-18T15:36:19Z 2022-04-18T15:36:19Z MEMBER

I added the "compression by gathering" scheme to cf-xarray. 1. https://cf-xarray.readthedocs.io/en/latest/generated/cf_xarray.encode_multi_index_as_compress.html 1. https://cf-xarray.readthedocs.io/en/latest/generated/cf_xarray.decode_compress_to_multi_index.html

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
645416425 https://github.com/pydata/xarray/issues/1077#issuecomment-645416425 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDY0NTQxNjQyNQ== dcherian 2448579 2020-06-17T14:40:19Z 2020-06-17T14:40:19Z MEMBER

@shoyer I now understand your earlier comment.

I agree that it should work with both sparse and MultiIndex but as such there's no way to decide whether this should be decoded to a sparse array or a MultiIndexed dense array.

Following your comment in https://github.com/pydata/xarray/issues/3213#issuecomment-521533999

Fortunately, there does seems to be a CF convention that would be a good fit for for sparse data in COO format, namely the indexed ragged array representation (example, note the instance_dimension attribute). That's probably the right thing to use for sparse arrays in xarray.

How about using this "compression by gathering" idea for MultiIndexed dense arrays and "indexed ragged arrays" for sparse arrays? I do not know the internals of sparse or the details of the CF conventions to have a strong opinion on which representation to prefer for sparse.COO arrays.

PS: CF convention for "indexed ragged arrays" is here: http://cfconventions.org/cf-conventions/cf-conventions.html#_indexed_ragged_array_representation

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
644803374 https://github.com/pydata/xarray/issues/1077#issuecomment-644803374 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDY0NDgwMzM3NA== dcherian 2448579 2020-06-16T14:31:23Z 2020-06-16T14:31:23Z MEMBER

I may be missing something but @fujiisoup's concern is addressed by the scheme in the CF conventions.

In your encoded, how can we tell the MultiIndex is [('a', 1), ('b', 1), ('a', 2), ('b', 2)] or [('a', 1), ('a', 2), ('b', 1), ('b', 2)]?

The information about ordering is stored as 1D indexes of an ND array; constructed using np.ravel_multi_index in the encode_multiindex function:

encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape)

For example, see the dimension coordinate landpoint in the encoded form ```

ds3 <xarray.Dataset> Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) MultiIndex - lat (landpoint) object 'a' 'b' 'b' 'a' - lon (landpoint) int64 1 2 1 2 Data variables: landsoilt (landpoint) float64 -0.2699 -1.228 0.4632 0.2287 encode_multiindex(ds3, "landpoint") <xarray.Dataset> Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 'a' 'b' * lon (lon) int64 1 2 * landpoint (landpoint) int64 0 3 2 1 Data variables: landsoilt (landpoint) float64 -0.2699 -1.228 0.4632 0.2287 ```

Here is a cleaned up version of the code for easy testing ``` python import numpy as np import pandas as pd import xarray as xr

def encode_multiindex(ds, idxname): encoded = ds.reset_index(idxname) coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels)) for coord in coords: encoded[coord] = coords[coord].values shape = [encoded.sizes[coord] for coord in coords] encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape) encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names) return encoded

def decode_to_multiindex(encoded, idxname): names = encoded[idxname].attrs["compress"].split(" ") shape = [encoded.sizes[dim] for dim in names] indices = np.unravel_index(encoded.landpoint.values, shape) arrays = [encoded[dim].values[index] for dim, index in zip(names, indices)] mindex = pd.MultiIndex.from_arrays(arrays)

decoded = xr.Dataset({}, {idxname: mindex})
for varname in encoded.data_vars:
    if idxname in encoded[varname].dims:
        decoded[varname] = (idxname, encoded[varname].values)
return decoded

ds1 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_product( [["a", "b"], [1, 2]], names=("lat", "lon") ) }, )

ds2 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_arrays( [["a", "b", "c", "d"], [1, 2, 4, 10]], names=("lat", "lon") ) }, )

ds3 = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_arrays( [["a", "b", "b", "a"], [1, 2, 1, 2]], names=("lat", "lon") ) }, )

idxname = "landpoint" for dataset in [ds1, ds2, ds3]: xr.testing.assert_identical( decode_to_multiindex(encode_multiindex(dataset, idxname), idxname), dataset ) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
644442679 https://github.com/pydata/xarray/issues/1077#issuecomment-644442679 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDY0NDQ0MjY3OQ== dcherian 2448579 2020-06-15T23:29:11Z 2020-06-15T23:38:30Z MEMBER

This seems to be possible following http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#compression-by-gathering

Here is a quick proof of concept:

``` python import numpy as np import pandas as pd import xarray as xr

example 1

ds = xr.Dataset( {"landsoilt": ("landpoint", np.random.randn(4))}, { "landpoint": pd.MultiIndex.from_product( [["a", "b"], [1, 2]], names=("lat", "lon") ) }, )

example 2

ds = xr.Dataset(

{"landsoilt": ("landpoint", np.random.randn(4))},

{

"landpoint": pd.MultiIndex.from_arrays(

[["a", "b", "c", "d"], [1, 2, 4, 10]], names=("lat", "lon")

)

},

)

encode step

detect using isinstance(index, pd.MultiIndex)

idxname = "landpoint" encoded = ds.reset_index(idxname) coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels)) for coord in coords: encoded[coord] = coords[coord].values shape = [encoded.sizes[coord] for coord in coords] encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape) encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names)

decode step

detect using "compress" in var.attrs

idxname = "landpoint"
names = encoded[idxname].attrs["compress"].split(" ") shape = [encoded.sizes[dim] for dim in names] indices = np.unravel_index(encoded.landpoint.values, shape) arrays = [encoded[dim].values[index] for dim, index in zip(names, indices)] mindex = pd.MultiIndex.from_arrays(arrays)

decoded = xr.Dataset({}, {idxname: mindex}) decoded["landsoilt"] = (idxname, encoded["landsoilt"].values)

xr.testing.assert_identical(decoded, ds)

```

encoded can be serialized using our existing code: <xarray.Dataset> Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 'a' 'b' * lon (lon) int64 1 2 * landpoint (landpoint) int64 0 1 2 3 Data variables: landsoilt (landpoint) float64 -1.668 -1.003 1.084 1.963

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 168.808ms · About: xarray-datasette