home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER" and issue = 278325492 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 6
  • fujiisoup 4

issue 1

  • Indexing Variable objects with a mask · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
352330986 https://github.com/pydata/xarray/pull/1751#issuecomment-352330986 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MjMzMDk4Ng== shoyer 1217238 2017-12-18T05:39:18Z 2017-12-18T05:39:18Z MEMBER

I decided to merge in the current state rather than let this get stale. We can add the public API later....

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
351597414 https://github.com/pydata/xarray/pull/1751#issuecomment-351597414 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MTU5NzQxNA== fujiisoup 6815844 2017-12-14T03:23:05Z 2017-12-14T03:23:05Z MEMBER

This looks good to me.

I think it is ready to expose _getitem_with_mask to public. As you mention in #TODO, I also think isel is the best place.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
351594432 https://github.com/pydata/xarray/pull/1751#issuecomment-351594432 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MTU5NDQzMg== fujiisoup 6815844 2017-12-14T03:00:34Z 2017-12-14T03:00:34Z MEMBER

Sorry for my late review. I will see later today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
351452403 https://github.com/pydata/xarray/pull/1751#issuecomment-351452403 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MTQ1MjQwMw== shoyer 1217238 2017-12-13T16:53:43Z 2017-12-13T16:53:43Z MEMBER

@fujiisoup could you kindly take another look?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
350148425 https://github.com/pydata/xarray/pull/1751#issuecomment-350148425 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MDE0ODQyNQ== shoyer 1217238 2017-12-08T01:47:51Z 2017-12-08T01:47:51Z MEMBER

I pushed some additional tests, which turned up the fact that dask's vectorized indexing does not support negative indices (fixed by https://github.com/dask/dask/pull/2967).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348870677 https://github.com/pydata/xarray/pull/1751#issuecomment-348870677 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODg3MDY3Nw== shoyer 1217238 2017-12-04T06:22:57Z 2017-12-04T06:22:57Z MEMBER

Okay, I'll come up with a few more tests to make sure this maintains 100% coverage... Let me know if you have any ideas for other edge cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348853749 https://github.com/pydata/xarray/pull/1751#issuecomment-348853749 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODg1Mzc0OQ== fujiisoup 6815844 2017-12-04T03:45:48Z 2017-12-04T03:45:48Z MEMBER

Could you add more test coverage of the first commit? It would help other developers to follow the logic.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348843824 https://github.com/pydata/xarray/pull/1751#issuecomment-348843824 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODg0MzgyNA== shoyer 1217238 2017-12-04T02:19:53Z 2017-12-04T02:19:53Z MEMBER

OK, I'm going to merge this (just the first commit), and leave the second part (actually changing reindex) to another PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348380756 https://github.com/pydata/xarray/pull/1751#issuecomment-348380756 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODM4MDc1Ng== shoyer 1217238 2017-12-01T02:07:21Z 2017-12-01T02:07:32Z MEMBER

I pushed another commit (mostly but not entirely working) to port reindex() to use getitem_with_mask rather than its current implementation. This allows for some nice simplification for the reindex_variables() code: my git change is 1 file changed, 33 insertions(+), 64 deletions(-).

To get a sense of how this effects performance, I made a small benchmarking script with our tutorial dataset: python import xarray import numpy as np ds_numpy = xarray.tutorial.load_dataset('air_temperature').load() ds_chunked = ds_numpy.chunk({'time': 100}) lat = np.linspace(ds_numpy.lat.min(), ds_numpy.lat.max(), num=100) lon = np.linspace(ds_numpy.lon.min(), ds_numpy.lon.max(), num=100) def do_reindex(ds): return ds.reindex(lat=lat, lon=lon, method='nearest', tolerance=0.5) %timeit do_reindex(ds_numpy) %timeit do_reindex(ds_chunked) result = do_reindex(ds_chunked) %timeit result.compute()

Our tutorial dataset is pretty small, but it can still give a flavor of how this scales. I chose new chunks intentionally with a small tolerance to create lots of empty chunks to mask: ``` In [2]: ds_numpy Out[2]: <xarray.Dataset> Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 55.0 52.5 ... * lon (lon) float32 200.0 202.5 205.0 207.5 210.0 212.5 215.0 217.5 ... * time (time) datetime64[ns] 2013-01-01T00:02:06.757437440 ... Data variables: air (time, lat, lon) float64 241.2 242.5 243.5 244.0 244.1 243.9 ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

In [3]: do_reindex(ds_numpy) Out[3]: <xarray.Dataset> Dimensions: (lat: 100, lon: 100, time: 2920) Coordinates: * lat (lat) float64 15.0 15.61 16.21 16.82 17.42 18.03 18.64 19.24 ... * lon (lon) float64 200.0 201.3 202.6 203.9 205.3 206.6 207.9 209.2 ... * time (time) datetime64[ns] 2013-01-01T00:02:06.757437440 ... Data variables: air (time, lat, lon) float64 296.3 nan 296.8 nan 297.1 nan 297.0 ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly... ```

Here are the benchmarking results:

Before: NumPy: 201 ms ± 3.48 ms per loop Dask build graph: 303 ms ± 5.48 ms per loop Dask compute: 30.7 s ± 35.9 ms per loop After: NumPy: 546 ms ± 26.3 ms per loop Dask build graph: 6.9 ms ± 464 µs per loop Dask compute: 411 ms ± 17.9 ms per loop

So NumPy is somewhat slower (about 2.5x), but reindexing with dask is 75x faster! It even shows some ability to parallelize better than pure NumPy.

This is encouraging. We should try to close the performance gap with NumPy (it was cleverly optimized before to use minimal copies of the data), but the existing reindex code with dask when doing masking is so slow that it is almost unusable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348376506 https://github.com/pydata/xarray/pull/1751#issuecomment-348376506 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODM3NjUwNg== fujiisoup 6815844 2017-12-01T01:39:41Z 2017-12-01T01:39:41Z MEMBER

It is a great idea! I previously raised #1553 for multidimensional reindex, but am still thinking how could we implement it. _getitem_with_mask looks cleanest.

I will look inside the code later, but I feel this should be exposed to the public. In this case, what is the most consistent naming with the existing methods? isel_mask?

I would probably leave it as a Variable method since this is pretty low-level.

Agreed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.04ms · About: xarray-datasette