home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 262642978 and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • max-sixty · 7 ✖

issue 1

  • Explicit indexes in xarray's data-model (Future of MultiIndex) · 7 ✖

author_association 1

  • MEMBER 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
557590898 https://github.com/pydata/xarray/issues/1603#issuecomment-557590898 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDU1NzU5MDg5OA== max-sixty 5635139 2019-11-22T16:04:22Z 2019-11-22T16:04:22Z MEMBER

I'll make an example of this when I find some free time, along with a contrasting one in Pandas. :)

👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
443239040 https://github.com/pydata/xarray/issues/1603#issuecomment-443239040 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDQ0MzIzOTA0MA== max-sixty 5635139 2018-11-30T15:29:15Z 2018-11-30T15:29:15Z MEMBER

How should dimension names interact with index names - i.e. the "Mapping indexes into pandas" in @shoyer 's comment

I'd suggest that option (3) should be invalid, and that da[dim_name] should return all the indexes on that dimension

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442906486 https://github.com/pydata/xarray/issues/1603#issuecomment-442906486 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDQ0MjkwNjQ4Ng== max-sixty 5635139 2018-11-29T16:46:52Z 2018-11-29T16:46:52Z MEMBER

And broadening out further:

Default behavior: all 1-dimensional coordinates each have their own, single index (pandas.Index), unless explicitly stated.

This is basically how I think of indexes - as a performant lookup data structure, rather than a feature of the schema. An RDBMS in a good corollary there.

Now, maybe there's enough overlap between the data access and the data schema that we should let them couple - e.g. would you want to be able to run .sel on any coord, even 2D? While it's possible in concept, it could guide users to inefficient operations.

We probably don't need to answer this question to proceed, but I'd be interested whether others see indexes as a property of the schema / I'm missing something.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442902327 https://github.com/pydata/xarray/issues/1603#issuecomment-442902327 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDQ0MjkwMjMyNw== max-sixty 5635139 2018-11-29T16:36:20Z 2018-11-29T16:36:20Z MEMBER

I broadly agree with @benbovy 's proposal.

One question that I think is worth being clear on is what additional contracts do multiple indexes on a dimension have over individual indexes?

e.g. re: Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 1 2 1 2 Multi-indexes: pandas.MultiIndex [level_1, level_2]

Am I right in thinking the Multi-indexes is only a helpful note to users, rather than conveying anything about how data is accessed?

@fujiisoup 's poses a good case of this question:

ds.sel(multi=list_of_pairs) can probably be replaced by ds.sel(x=..., y=...), but how about reindex along MultiIndex?

(and separately, I think we can do much of this before adding the ability to set custom indexes, which would be cool but further from where we are, I think)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442725856 https://github.com/pydata/xarray/issues/1603#issuecomment-442725856 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDQ0MjcyNTg1Ng== max-sixty 5635139 2018-11-29T06:52:49Z 2018-11-29T06:52:49Z MEMBER

Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN).

💯- that very much resonates! And it leaves the implementation flexible if we want to iterate.

I'll try to think of some dissenting cases to the proposal / helpful responses to the above.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442636798 https://github.com/pydata/xarray/issues/1603#issuecomment-442636798 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDQ0MjYzNjc5OA== max-sixty 5635139 2018-11-28T22:54:26Z 2018-11-28T22:54:26Z MEMBER

Potentially this is too much 'stepping back' now we're at the implementation stage - my perception is that @shoyer is leading this without much support, so weighting having some additional viewpoints, some questions:

Is a MultiIndex a feature of the schema or the implementation?

I had thought of an MI being an implementation detail in code, rather than in the data schema. We use it as a container for all the indexes along a dimension, rather than representing any properties about the data it contains.

One exception to that would be if we wanted multiple groups of indexes along the same dimension, for example:

``` Coordinates: * xa (x) MultiIndex[level_a_1, level_a_2] * level_a_1 (x) object 'a' 'a' 'b' 'b' * level_a_2 (x) int64 1 2 1 2

  • xb (x) MultiIndex[level_b_1, level_b_2]
  • level_b_1 (x) object 'a' 'a' 'b' 'b'
  • level_b_2 (x) int64 1 2 1 2 ```

But is that common / required?

MultiIndex as an implementation detail

If it's an implementation detail, is there a benefit to investing in allowing both separate and MIs? While it may not be possible to do pointwise indexing with the current implementation of MI, am I mistaken that it's not an API issue, assuming we pass in index names? e.g.:

```python [ins] In [22]: da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'], coords=dict(x=list('abc'), y=pd.MultiIndex.from_product([list('ab'),[1,2]])))

[ins] In [23]: da Out[23]: <xarray.DataArray (x: 3, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) <U1 'a' 'b' 'c' * y (y) MultiIndex - y_level_0 (y) object 'a' 'a' 'b' 'b' - y_level_1 (y) int64 1 2 1 2

[ins] In [26]: da.sel(x=xr.DataArray(['a','c'],dims=['z']), y_level_0=xr.DataArray(['a','b'],dims=['z']) y_level_1=xr.DataArray([1,1],dims=['z']))

Out[80]: # hypothetical <xarray.DataArray (z: 3)> array([ 0, 10]) Dimensions without coordinates: z ```

If that's the case, could we instead force all indexes along a dimension to be in a MI, tolerate the short-term constraints of the current MI implementation, and where needed build out additional features?

That would (ideally) leave us uncoupled to MIs - if we built a better in-memory data structure, we could transition. The contract would be around the cases above.

--

...and as mentioned above, these are intended as questions rather than high-confident views.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
380323532 https://github.com/pydata/xarray/issues/1603#issuecomment-380323532 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDM4MDMyMzUzMg== max-sixty 5635139 2018-04-11T04:28:53Z 2018-04-11T04:28:53Z MEMBER

Overall, I agree with the proposed conclusion. And appreciate the level of thoughtfulness and clarity. I'm happy to help with some of the implementation if we can split this up.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 85.947ms · About: xarray-datasette