home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER" and issue = 294241734 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • max-sixty 6
  • shoyer 4

issue 1

  • Boolean indexing with multi-dimensional key arrays · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
825176507 https://github.com/pydata/xarray/issues/1887#issuecomment-825176507 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNTE3NjUwNw== max-sixty 5635139 2021-04-22T20:50:29Z 2021-04-22T21:06:47Z MEMBER

stack(new_dim=["a", "b"], dropna=True)

This could be useful (potentially we can open a different issue). While someone can call .dropna, that coerces to floats (or some type that supports missing) and can allocate more than is needed. Potentially this can be considered along with issues around sparse, e.g. https://github.com/pydata/xarray/issues/3245, https://github.com/pydata/xarray/issues/4143

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824503658 https://github.com/pydata/xarray/issues/1887#issuecomment-824503658 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDUwMzY1OA== max-sixty 5635139 2021-04-22T03:04:41Z 2021-04-22T03:04:51Z MEMBER

I'm still working through this. Using this to jot down my notes, no need to respond.

One property that seems to be lacking is that if key changes from n-1 to n dimensions, the behavior changes (also outlined here):

```python In [171]: a Out[171]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])

In [172]: mask Out[172]: array([ True, False, True])

In [173]: a[mask] Out[173]: array([[ 0, 1, 2, 3], [ 8, 9, 10, 11]]) ```

...as expected, but now let's make a 2D mask...

```python In [174]: full_mask = np.broadcast_to(mask[:, np.newaxis], (3,4))

In [175]: full_mask Out[175]: array([[ True, True, True, True], [False, False, False, False], [ True, True, True, True]])

In [176]: a[full_mask] Out[176]: array([ 0, 1, 2, 3, 8, 9, 10, 11]) # flattened! ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824461333 https://github.com/pydata/xarray/issues/1887#issuecomment-824461333 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDQ2MTMzMw== shoyer 1217238 2021-04-22T01:02:32Z 2021-04-22T01:02:32Z MEMBER

Current proposal ("stack"), of da[key] and with a dimension of key's name (and probably no multiindex):

python In [86]: da.values[key.values] Out[86]: array([0, 3, 6, 9]) # But the xarray version

The part about this new proposal that is most annoying is that the key needs a name, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like da[key.rename('key_name')] much of the time to make this work.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824460304 https://github.com/pydata/xarray/issues/1887#issuecomment-824460304 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDQ2MDMwNA== shoyer 1217238 2021-04-22T00:59:25Z 2021-04-22T00:59:25Z MEMBER

OK great. To confirm, this is what it would look like:

Yes, this looks right to me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824454992 https://github.com/pydata/xarray/issues/1887#issuecomment-824454992 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDQ1NDk5Mg== max-sixty 5635139 2021-04-22T00:40:49Z 2021-04-22T00:40:49Z MEMBER

I'm not quite sure this is true -- it's the difference between needing to call stack() vs unstack().

This was a tiny point so it's fine to discard. I had meant that producing the where result via the stack result requires a stack and unstack. But producing the stack result via a where result requires only one stack — the where result is very cheap.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824452843 https://github.com/pydata/xarray/issues/1887#issuecomment-824452843 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDQ1Mjg0Mw== max-sixty 5635139 2021-04-22T00:33:29Z 2021-04-22T00:35:28Z MEMBER

OK great. To confirm, this is what it would look like:

Context:

```python In [81]: da = xr.DataArray(np.arange(12).reshape(3,4), dims=list('ab'))

In [82]: da Out[82]: <xarray.DataArray (a: 3, b: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Dimensions without coordinates: a, b

In [84]: key = da % 3 == 0

In [83]: key Out[83]: <xarray.DataArray (a: 3, b: 4)> array([[ True, False, False, True], [False, False, True, False], [False, True, False, False]]) Dimensions without coordinates: a, b ```

Currently ```python

In [85]: da[key]

IndexError Traceback (most recent call last) <ipython-input-85-7fd83c907cb6> in <module> ----> 1 da[key] ... ~/.asdf/installs/python/3.8.8/lib/python3.8/site-packages/xarray/core/variable.py in _validate_indexers(self, key) 697 ) 698 if k.ndim > 1: --> 699 raise IndexError( 700 "{}-dimensional boolean indexing is " 701 "not supported. ".format(k.ndim)

IndexError: 2-dimensional boolean indexing is not supported. ```

Current proposal ("stack"), of da[key] and with a dimension of key's name (and probably no multiindex): python In [86]: da.values[key.values] Out[86]: array([0, 3, 6, 9]) # But the xarray version

Previous suggestion ("where"), for the result of da[key]: python In [87]: da.where(key) Out[87]: <xarray.DataArray (a: 3, b: 4)> array([[ 0., nan, nan, 3.], [nan, nan, 6., nan], [nan, 9., nan, nan]]) Dimensions without coordinates: a, b

(small follow up I'll put in another message, for clarity)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824329772 https://github.com/pydata/xarray/issues/1887#issuecomment-824329772 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDMyOTc3Mg== shoyer 1217238 2021-04-21T20:16:10Z 2021-04-21T20:16:10Z MEMBER

I've been trying to conceptualize why I think the where equivalence (the original proposal) is better than the stack proposal (the latter).

Here are two reasons why I like the stack version:

  1. It's more NumPy like -- boolean indexing in NumPy returns a flat array in the same way
  2. It doesn't need dtype promotion to handle possibly missing values, so it will have more predictable semantics.

As a side note: one nice feature of using isel() for stacking is that it does not create a MultiIndex, which can be expensive. But there's no reason why we necessarily need to do that for stack(). I'll open a new issue to discuss adding an optional parameter.

  • I'm not sure how the setitem would work; da[key] = value?

To match the semantics of NumPy, value would need to have matching dims/coords to those of da[key]. In other words, it would also need to be stacked.

  • If someone wants the stack result, it's less work to do original -> where result -> stack result relative to original -> stack result -> where result; which suggests they're more composable?

I'm not quite sure this is true -- it's the difference between needing to call stack() vs unstack().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
824299104 https://github.com/pydata/xarray/issues/1887#issuecomment-824299104 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDI5OTEwNA== max-sixty 5635139 2021-04-21T19:21:46Z 2021-04-21T19:21:46Z MEMBER

I've been trying to conceptualize why I think the where equivalence (the original proposal) is better than the stack proposal (the latter). I think it's mostly: - It's simpler - I'm not sure how the setitem would work; da[key] = value? - If someone wants the stack result, it's less work to do original -> where result -> stack result relative to original -> stack result -> where result; which suggests they're more composable?

But I don't do much pointwise indexing — and so maybe we do want to prioritize that

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
823673654 https://github.com/pydata/xarray/issues/1887#issuecomment-823673654 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyMzY3MzY1NA== shoyer 1217238 2021-04-20T23:50:34Z 2021-04-20T23:50:34Z MEMBER

It's worth noting that there is at least one other way boolean indexing could work:

  • ds[key] could work like ds.stack({key.name: key.dims}).isel({key.name: np.flatnonzero(key.data)}), except without creating a MultiIndex. Arguably this might be more useful and also more consistent with NumPy itself. It's also more similar to the operation @Hoeze wants in https://github.com/pydata/xarray/issues/5179.

We can't support both with the same syntax, so we have to make a choice here :).

See also the discussion about what drop_duplicates/unique should do over in https://github.com/pydata/xarray/pull/5089.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
803491524 https://github.com/pydata/xarray/issues/1887#issuecomment-803491524 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgwMzQ5MTUyNA== max-sixty 5635139 2021-03-21T00:38:23Z 2021-03-21T00:38:23Z MEMBER

I've added the "good first issue" label — at least the first two bullets of the proposal would be relatively simple to implement, given they're mostly syntactic sugar.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.817ms · About: xarray-datasette