home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where author_association = "MEMBER" and issue = 295838143 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • fujiisoup 12
  • shoyer 5
  • jhamman 2

issue 1

  • Vectorized lazy indexing · 19 ✖

author_association 1

  • MEMBER · 19 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
374422762 https://github.com/pydata/xarray/pull/1899#issuecomment-374422762 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3NDQyMjc2Mg== fujiisoup 6815844 2018-03-19T23:40:52Z 2018-03-19T23:40:52Z MEMBER

Yes, LazilyIndexedArray was renamed to LazilyOuterIndexedArray and LazilyVectorizedIndexedArray was newly added. These two backend arrays are selected depending on what kind of indexer is used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
370970309 https://github.com/pydata/xarray/pull/1899#issuecomment-370970309 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDk3MDMwOQ== fujiisoup 6815844 2018-03-06T23:45:13Z 2018-03-06T23:45:13Z MEMBER

Thanks, @WeatherGod , for your feedback. This is finally merged!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
370944391 https://github.com/pydata/xarray/pull/1899#issuecomment-370944391 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDk0NDM5MQ== shoyer 1217238 2018-03-06T22:01:04Z 2018-03-06T22:01:04Z MEMBER

OK, in it goes. Thanks @fujiisoup !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
370125916 https://github.com/pydata/xarray/pull/1899#issuecomment-370125916 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDEyNTkxNg== fujiisoup 6815844 2018-03-03T07:11:24Z 2018-03-03T07:11:24Z MEMBER

All done :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
368385680 https://github.com/pydata/xarray/pull/1899#issuecomment-368385680 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2ODM4NTY4MA== fujiisoup 6815844 2018-02-26T04:16:03Z 2018-02-26T04:16:03Z MEMBER

I think it's ready :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
368383877 https://github.com/pydata/xarray/pull/1899#issuecomment-368383877 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2ODM4Mzg3Nw== jhamman 2443309 2018-02-26T04:00:24Z 2018-02-26T04:00:24Z MEMBER

@fujiisoup - is this ready for a final review? I see you have all the tests passing 💯 !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366618866 https://github.com/pydata/xarray/pull/1899#issuecomment-366618866 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjYxODg2Ng== fujiisoup 6815844 2018-02-19T08:30:01Z 2018-02-19T08:30:01Z MEMBER

This looks some backends do not support negative step slices. I'm going to wrap this maybe this weekend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366377467 https://github.com/pydata/xarray/pull/1899#issuecomment-366377467 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NzQ2Nw== fujiisoup 6815844 2018-02-16T22:30:32Z 2018-02-16T22:30:32Z MEMBER

@WeatherGod, Thanks for testing. Can you share more detail? With your example, what does wind_inds look like? Can you share the shape and dimension names?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366373577 https://github.com/pydata/xarray/pull/1899#issuecomment-366373577 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3MzU3Nw== fujiisoup 6815844 2018-02-16T22:12:44Z 2018-02-16T22:16:13Z MEMBER

Can you share how you tested this? The test I added says it is still in memory after vectroized indexing.

edit: wind_inds is a 1d-array? If this is the case, the both should trigger OuterIndexing. But in both cases it should be indexed lazily...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365727175 https://github.com/pydata/xarray/pull/1899#issuecomment-365727175 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcyNzE3NQ== jhamman 2443309 2018-02-14T19:59:36Z 2018-02-14T19:59:36Z MEMBER

@WeatherGod - you are right, all the pynio tests are being skipped on travis. I'll open a separate issue for that. Yikes!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364755370 https://github.com/pydata/xarray/pull/1899#issuecomment-364755370 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDc1NTM3MA== fujiisoup 6815844 2018-02-11T14:25:40Z 2018-02-11T19:49:04Z MEMBER

Based on the suggestion, I implemented the lazy vectorized indexing with index-consolidation.

Now, every backend is virtually compatible to all the indexer types, i.e. basic-, outer- and vectorized-indexers.

It sometimes consume large amount of memory if the indexer is unable to decompose efficiently, but it is always better than loading the full slice. The drawback is the unpredictability of how many data will be loaded.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364625973 https://github.com/pydata/xarray/pull/1899#issuecomment-364625973 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDYyNTk3Mw== fujiisoup 6815844 2018-02-10T04:47:04Z 2018-02-10T04:47:04Z MEMBER

There are some obvious fail cases, e.g., if they want to pull out indices array[[1, -1], [1, -1]], in which case the entire array needs to be sliced.

If the backend supports the orthogonal indexing (not only the basic indexing), we can do array[[1, -1]][:, [1, -1]], load the 2x2 array, then apply the vectorized indexing [[0, 1], [0, 1]].

But if we want a full diagonal, we need a full slice anyway...

Also, we would want to avoid separating basic/vectorized for backends that support efficient vectorized indexing (scipy and zarr).

OK. Agreed. We may need a flag that can be accessed from the array wrapper.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364625429 https://github.com/pydata/xarray/pull/1899#issuecomment-364625429 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDYyNTQyOQ== shoyer 1217238 2018-02-10T04:33:44Z 2018-02-10T04:33:44Z MEMBER

in case we want to get three diagonal elements (1, 1), (2, 2), (3, 3) from a 1000x1000 array. What we want is array[[1, 2, 3], [1, 2, 3]]. It can be decomposed to array[1: 4, 1:4][[0, 1, 2], [0, 1, 2]]. We only need to load 3 x 3 part of the 1000 x 1000 array, then take its diagonal elements.

OK, this is pretty clever.

There are some obvious fail cases, e.g., if they want to pull out indices array[[1, -1], [1, -1]], in which case the entire array needs to be sliced. I wonder if we should try to detect these with some heuristics, e.g., if the size of the result is much (maybe 10x or 100x) smaller than the size of sliced arrays.

Also, we would want to avoid separating basic/vectorized for backends that support efficient vectorized indexing (scipy and zarr).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364616100 https://github.com/pydata/xarray/pull/1899#issuecomment-364616100 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDYxNjEwMA== fujiisoup 6815844 2018-02-10T01:47:54Z 2018-02-10T01:47:54Z MEMBER

I am inclined to the option 1, as there are some benefit even for backend without the vectorized-indexing support, e.g. in case we want to get three diagonal elements (1, 1), (2, 2), (3, 3) from a 1000x1000 array. What we want is array[[1, 2, 3], [1, 2, 3]]. It can be decomposed to array[1: 4, 1:4][[0, 1, 2], [0, 1, 2]]. We only need to load 3 x 3 part of the 1000 x 1000 array, then take its diagonal elements.

A drawback is that it is difficult for users to predict how large memory is necessary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364583951 https://github.com/pydata/xarray/pull/1899#issuecomment-364583951 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDU4Mzk1MQ== shoyer 1217238 2018-02-09T22:10:43Z 2018-02-09T22:10:43Z MEMBER

I think the design choice here really comes down to whether we want to enable VectorizedIndexing on arbitrary data on disk or not:

Is it better to: 1. Always allow vectorized indexing by means of (lazily) loading all indexed data into memory as a single chunk. This could potentially be very expensive for IO or memory in hard to predict ways. 2. Or to only allow vectorized indexing if a backend supports it directly. This ensures that when vectorized indexing works it works efficiently. Vectorized indexing is still possibly but you have to explicitly write .compute()/.load().

I think I slightly prefer option (2) but I can see the merits in either decision.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364573996 https://github.com/pydata/xarray/pull/1899#issuecomment-364573996 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDU3Mzk5Ng== shoyer 1217238 2018-02-09T21:30:40Z 2018-02-09T21:30:40Z MEMBER

Reason 2 is the primary one. We want to load the minimum amount of data possible into memory, mostly because pulling data from disk is slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364573328 https://github.com/pydata/xarray/pull/1899#issuecomment-364573328 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDU3MzMyOA== fujiisoup 6815844 2018-02-09T21:28:26Z 2018-02-09T21:28:26Z MEMBER

Thanks, @shoyer Do you think it is possible to consolidate transpose also? We need it to keep our logic in Variable._broadcast_indexing.

I am wondering what computation cost we want to avoid by the lazy indexing. 1. The indexing itself is expensive so we want to minimize the number of indexing operation? 2. The original data is too large to fit into memory, and we want to load the smallest subset of the original array by the lazy indexing?

If the reason 2 is the common case, I think it is not a good idea to consolidate all the lazy indexing as VectorizedIndexer, since most of the backend does not support vectorized indexing, which means we need to load all the array into memory before any indexing operation. (But still it would be valuable to consolidate all the indexers after the first vectorized indexer, since we can decompose any VectorizedIndexer into successive outer- and smaller vectorized-indexers pair.)

And I am also wondering as pointed out in #1725, what I am doing now was already implemented in dask.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364529325 https://github.com/pydata/xarray/pull/1899#issuecomment-364529325 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDUyOTMyNQ== shoyer 1217238 2018-02-09T19:07:39Z 2018-02-09T19:07:39Z MEMBER

I figured out how to consolidate two vectorized indexers, as long as they don't include any slice objects: ```python import numpy as np

def index_vectorized_indexer(old_indexer, applied_indexer): return tuple(o[applied_indexer] for o in np.broadcast_arrays(*old_indexer))

for x, old, applied in [ (np.arange(10), (np.arange(2, 7),), (np.array([3, 2, 1]),)), (np.arange(10), (np.arange(6).reshape(2, 3),), (np.arange(2), np.arange(1, 3))), (-np.arange(1, 21).reshape(4, 5), (np.arange(3)[:, None], np.arange(4)[None, :]), (np.arange(3), np.arange(3))), ]: new_key = index_vectorized_indexer(old, applied) np.testing.assert_array_equal(x[old][applied], x[new_key]) ```

We could probably make this work with VectorizedIndexer if we converted the slice objects to arrays. I think we might even already have some code to do that conversion somewhere. So another option would be to convert BasicIndexer and OuterIndexer -> VectorizedIndexer if necessary and then use this path.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
364442081 https://github.com/pydata/xarray/pull/1899#issuecomment-364442081 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NDQ0MjA4MQ== fujiisoup 6815844 2018-02-09T14:04:16Z 2018-02-09T14:04:16Z MEMBER

I noticed the lazy vectorized indexing can be (sometimes) optimized by decomposing the vectorized indexers into successive outer and vectorized indexers, so that the size of the array to be loaded into memory is minimized.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.294ms · About: xarray-datasette