home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 295838143 and user = 1217238 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

These facets timed out: author_association, issue

user 1

  • shoyer · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
370944391 https://github.com/pydata/xarray/pull/1899#issuecomment-370944391 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDk0NDM5MQ== shoyer 1217238 2018-03-06T22:01:04Z 2018-03-06T22:01:04Z MEMBER

OK, in it goes. Thanks @fujiisoup !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364625429 https://github.com/pydata/xarray/pull/1899#issuecomment-364625429 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDYyNTQyOQ== shoyer 1217238 2018-02-10T04:33:44Z 2018-02-10T04:33:44Z MEMBER

in case we want to get three diagonal elements (1, 1), (2, 2), (3, 3) from a 1000x1000 array. What we want is array[[1, 2, 3], [1, 2, 3]]. It can be decomposed to array[1: 4, 1:4][[0, 1, 2], [0, 1, 2]]. We only need to load 3 x 3 part of the 1000 x 1000 array, then take its diagonal elements.

OK, this is pretty clever.

There are some obvious fail cases, e.g., if they want to pull out indices array[[1, -1], [1, -1]], in which case the entire array needs to be sliced. I wonder if we should try to detect these with some heuristics, e.g., if the size of the result is much (maybe 10x or 100x) smaller than the size of sliced arrays.

Also, we would want to avoid separating basic/vectorized for backends that support efficient vectorized indexing (scipy and zarr).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364583951 https://github.com/pydata/xarray/pull/1899#issuecomment-364583951 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDU4Mzk1MQ== shoyer 1217238 2018-02-09T22:10:43Z 2018-02-09T22:10:43Z MEMBER

I think the design choice here really comes down to whether we want to enable VectorizedIndexing on arbitrary data on disk or not:

Is it better to: 1. Always allow vectorized indexing by means of (lazily) loading all indexed data into memory as a single chunk. This could potentially be very expensive for IO or memory in hard to predict ways. 2. Or to only allow vectorized indexing if a backend supports it directly. This ensures that when vectorized indexing works it works efficiently. Vectorized indexing is still possibly but you have to explicitly write .compute()/.load().

I think I slightly prefer option (2) but I can see the merits in either decision.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364573996 https://github.com/pydata/xarray/pull/1899#issuecomment-364573996 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDU3Mzk5Ng== shoyer 1217238 2018-02-09T21:30:40Z 2018-02-09T21:30:40Z MEMBER

Reason 2 is the primary one. We want to load the minimum amount of data possible into memory, mostly because pulling data from disk is slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364529325 https://github.com/pydata/xarray/pull/1899#issuecomment-364529325 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDUyOTMyNQ== shoyer 1217238 2018-02-09T19:07:39Z 2018-02-09T19:07:39Z MEMBER

I figured out how to consolidate two vectorized indexers, as long as they don't include any slice objects: ```python import numpy as np

def index_vectorized_indexer(old_indexer, applied_indexer): return tuple(o[applied_indexer] for o in np.broadcast_arrays(*old_indexer))

for x, old, applied in [ (np.arange(10), (np.arange(2, 7),), (np.array([3, 2, 1]),)), (np.arange(10), (np.arange(6).reshape(2, 3),), (np.arange(2), np.arange(1, 3))), (-np.arange(1, 21).reshape(4, 5), (np.arange(3)[:, None], np.arange(4)[None, :]), (np.arange(3), np.arange(3))), ]: new_key = index_vectorized_indexer(old, applied) np.testing.assert_array_equal(x[old][applied], x[new_key]) ```

We could probably make this work with VectorizedIndexer if we converted the slice objects to arrays. I think we might even already have some code to do that conversion somewhere. So another option would be to convert BasicIndexer and OuterIndexer -> VectorizedIndexer if necessary and then use this path.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4671.095ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows