home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 442636798

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1603#issuecomment-442636798 https://api.github.com/repos/pydata/xarray/issues/1603 442636798 MDEyOklzc3VlQ29tbWVudDQ0MjYzNjc5OA== 5635139 2018-11-28T22:54:26Z 2018-11-28T22:54:26Z MEMBER

Potentially this is too much 'stepping back' now we're at the implementation stage - my perception is that @shoyer is leading this without much support, so weighting having some additional viewpoints, some questions:

Is a MultiIndex a feature of the schema or the implementation?

I had thought of an MI being an implementation detail in code, rather than in the data schema. We use it as a container for all the indexes along a dimension, rather than representing any properties about the data it contains.

One exception to that would be if we wanted multiple groups of indexes along the same dimension, for example:

``` Coordinates: * xa (x) MultiIndex[level_a_1, level_a_2] * level_a_1 (x) object 'a' 'a' 'b' 'b' * level_a_2 (x) int64 1 2 1 2

  • xb (x) MultiIndex[level_b_1, level_b_2]
  • level_b_1 (x) object 'a' 'a' 'b' 'b'
  • level_b_2 (x) int64 1 2 1 2 ```

But is that common / required?

MultiIndex as an implementation detail

If it's an implementation detail, is there a benefit to investing in allowing both separate and MIs? While it may not be possible to do pointwise indexing with the current implementation of MI, am I mistaken that it's not an API issue, assuming we pass in index names? e.g.:

```python [ins] In [22]: da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'], coords=dict(x=list('abc'), y=pd.MultiIndex.from_product([list('ab'),[1,2]])))

[ins] In [23]: da Out[23]: <xarray.DataArray (x: 3, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) <U1 'a' 'b' 'c' * y (y) MultiIndex - y_level_0 (y) object 'a' 'a' 'b' 'b' - y_level_1 (y) int64 1 2 1 2

[ins] In [26]: da.sel(x=xr.DataArray(['a','c'],dims=['z']), y_level_0=xr.DataArray(['a','b'],dims=['z']) y_level_1=xr.DataArray([1,1],dims=['z']))

Out[80]: # hypothetical <xarray.DataArray (z: 3)> array([ 0, 10]) Dimensions without coordinates: z ```

If that's the case, could we instead force all indexes along a dimension to be in a MI, tolerate the short-term constraints of the current MI implementation, and where needed build out additional features?

That would (ideally) leave us uncoupled to MIs - if we built a better in-memory data structure, we could transition. The contract would be around the cases above.

--

...and as mentioned above, these are intended as questions rather than high-confident views.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  262642978
Powered by Datasette · Queries took 0.671ms · About: xarray-datasette