home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and issue = 294241734 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • Hoeze 2
  • shaprann 1

issue 1

  • Boolean indexing with multi-dimensional key arrays · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
824782830 https://github.com/pydata/xarray/issues/1887#issuecomment-824782830 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDgyNDc4MjgzMA== Hoeze 1200058 2021-04-22T12:08:45Z 2021-04-22T12:11:55Z NONE

Current proposal ("stack"), of da[key] and with a dimension of key's name (and probably no multiindex): python In [86]: da.values[key.values] Out[86]: array([0, 3, 6, 9]) # But the xarray version

The part about this new proposal that is most annoying is that the key needs a name, which we can use to name the new dimension. That's not too hard to do, but it is little annoying -- in practice you would have to write something like da[key.rename('key_name')] much of the time to make this work.

IMO, the perfect solution would be masking support. I.e. da[key] would return the same array with an additional variable da.mask == key: python In [87]: da[key] Out[87]: <xarray.DataArray (a: 3, b: 4)> array([[ 0, <NA>, <NA>, 3], [<NA>, <NA>, 6, <NA>], [<NA>, 9, <NA>, <NA>]]) dtype: int Dimensions without coordinates: a, b Then we could have something like da[key].stack(new_dim=["a", "b"], dropna=True): python In [87]: da[key].stack(new_dim=["a", "b"], dropna=True) Out[87]: <xarray.DataArray (newdim: 4)> array([0, 3, 6, 9]) coords{ "a" (newdim): [0, 0, 1, 2], "b" (newdim): [0, 3, 2, 1], } Dimensions without coordinates: newdim Here, dropna=True would allow avoiding to create the cross-product of a, b.

Also, that would avoid all those unnecessary float casts for free.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
744463486 https://github.com/pydata/xarray/issues/1887#issuecomment-744463486 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDc0NDQ2MzQ4Ng== shaprann 43274047 2020-12-14T14:07:32Z 2020-12-14T15:47:18Z NONE

Just wanted to confirm, that boolean indexing is indeed highly relevant, especially for assigning values instead of just selecting them. Here is a use case which I encounter very often:

I'm working with very sparse data (e.g a satellite image of some islands surrounded by water), and I want to modify it using some_vectorized_function(). Of course I could use some_vectorized_function() to process the whole image, but boolean masking allows me to save a lot of computations.

Here is how I would achieve this in numpy:

``` import numpy as np import some_vectorized_function

image = np.array( # image.shape == (3, 7, 7) [[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 454, 454, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 565, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 343, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],

 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 454, 565, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 667, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 878, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],

 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 565, 676, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 323, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 545, 0.0],
  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]]

) image = np.moveaxis(image, 0, -1) # image.shape == (7, 7, 3)

"image" is a standard RGB image

with shape == (height, width, channel)

but only 4 pixels contain relevant data!

mask = np.all(image > 0, axis=-1) # mask.shape == (7, 7) # mask.dtype == bool # mask.sum() == 4

image[mask] = some_vectorized_function(image[mask]) # len(image[mask]) == 4 # image[mask].shape == (4, 3) ```

The most important fact here is that image[mask] is just a list of 4 pixels, which I can process and then assign them back into their original place. And as you see, this boolean masking also plays very nice with broadcasting, which allows me to mask a 3D array with a 2D mask.

Unfortunately, nothing like this is currently possible with XArray. If implemented, it would enable some crazy speedups for operations like spatial interpolation, where we don't want to interpolate the whole image, but only some pixels that we care about.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734
544693024 https://github.com/pydata/xarray/issues/1887#issuecomment-544693024 https://api.github.com/repos/pydata/xarray/issues/1887 MDEyOklzc3VlQ29tbWVudDU0NDY5MzAyNA== Hoeze 1200058 2019-10-21T20:27:14Z 2019-10-21T20:27:14Z NONE

Since https://github.com/pydata/xarray/issues/3206 has been implemented now: Maybe fancy boolean indexing (da[boolean_mask]) could return a sparse array as well.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Boolean indexing with multi-dimensional key arrays 294241734

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.04ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows