home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 278325492 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 6 ✖

issue 1

  • Indexing Variable objects with a mask · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
352330986 https://github.com/pydata/xarray/pull/1751#issuecomment-352330986 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MjMzMDk4Ng== shoyer 1217238 2017-12-18T05:39:18Z 2017-12-18T05:39:18Z MEMBER

I decided to merge in the current state rather than let this get stale. We can add the public API later....

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
351452403 https://github.com/pydata/xarray/pull/1751#issuecomment-351452403 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MTQ1MjQwMw== shoyer 1217238 2017-12-13T16:53:43Z 2017-12-13T16:53:43Z MEMBER

@fujiisoup could you kindly take another look?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
350148425 https://github.com/pydata/xarray/pull/1751#issuecomment-350148425 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM1MDE0ODQyNQ== shoyer 1217238 2017-12-08T01:47:51Z 2017-12-08T01:47:51Z MEMBER

I pushed some additional tests, which turned up the fact that dask's vectorized indexing does not support negative indices (fixed by https://github.com/dask/dask/pull/2967).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348870677 https://github.com/pydata/xarray/pull/1751#issuecomment-348870677 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODg3MDY3Nw== shoyer 1217238 2017-12-04T06:22:57Z 2017-12-04T06:22:57Z MEMBER

Okay, I'll come up with a few more tests to make sure this maintains 100% coverage... Let me know if you have any ideas for other edge cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348843824 https://github.com/pydata/xarray/pull/1751#issuecomment-348843824 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODg0MzgyNA== shoyer 1217238 2017-12-04T02:19:53Z 2017-12-04T02:19:53Z MEMBER

OK, I'm going to merge this (just the first commit), and leave the second part (actually changing reindex) to another PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492
348380756 https://github.com/pydata/xarray/pull/1751#issuecomment-348380756 https://api.github.com/repos/pydata/xarray/issues/1751 MDEyOklzc3VlQ29tbWVudDM0ODM4MDc1Ng== shoyer 1217238 2017-12-01T02:07:21Z 2017-12-01T02:07:32Z MEMBER

I pushed another commit (mostly but not entirely working) to port reindex() to use getitem_with_mask rather than its current implementation. This allows for some nice simplification for the reindex_variables() code: my git change is 1 file changed, 33 insertions(+), 64 deletions(-).

To get a sense of how this effects performance, I made a small benchmarking script with our tutorial dataset: python import xarray import numpy as np ds_numpy = xarray.tutorial.load_dataset('air_temperature').load() ds_chunked = ds_numpy.chunk({'time': 100}) lat = np.linspace(ds_numpy.lat.min(), ds_numpy.lat.max(), num=100) lon = np.linspace(ds_numpy.lon.min(), ds_numpy.lon.max(), num=100) def do_reindex(ds): return ds.reindex(lat=lat, lon=lon, method='nearest', tolerance=0.5) %timeit do_reindex(ds_numpy) %timeit do_reindex(ds_chunked) result = do_reindex(ds_chunked) %timeit result.compute()

Our tutorial dataset is pretty small, but it can still give a flavor of how this scales. I chose new chunks intentionally with a small tolerance to create lots of empty chunks to mask: ``` In [2]: ds_numpy Out[2]: <xarray.Dataset> Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 55.0 52.5 ... * lon (lon) float32 200.0 202.5 205.0 207.5 210.0 212.5 215.0 217.5 ... * time (time) datetime64[ns] 2013-01-01T00:02:06.757437440 ... Data variables: air (time, lat, lon) float64 241.2 242.5 243.5 244.0 244.1 243.9 ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

In [3]: do_reindex(ds_numpy) Out[3]: <xarray.Dataset> Dimensions: (lat: 100, lon: 100, time: 2920) Coordinates: * lat (lat) float64 15.0 15.61 16.21 16.82 17.42 18.03 18.64 19.24 ... * lon (lon) float64 200.0 201.3 202.6 203.9 205.3 206.6 207.9 209.2 ... * time (time) datetime64[ns] 2013-01-01T00:02:06.757437440 ... Data variables: air (time, lat, lon) float64 296.3 nan 296.8 nan 297.1 nan 297.0 ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly... ```

Here are the benchmarking results:

Before: NumPy: 201 ms ± 3.48 ms per loop Dask build graph: 303 ms ± 5.48 ms per loop Dask compute: 30.7 s ± 35.9 ms per loop After: NumPy: 546 ms ± 26.3 ms per loop Dask build graph: 6.9 ms ± 464 µs per loop Dask compute: 411 ms ± 17.9 ms per loop

So NumPy is somewhat slower (about 2.5x), but reindexing with dask is 75x faster! It even shows some ability to parallelize better than pure NumPy.

This is encouraging. We should try to close the performance gap with NumPy (it was cleverly optimized before to use minimal copies of the data), but the existing reindex code with dask when doing masking is so slow that it is almost unusable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Indexing Variable objects with a mask 278325492

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 208.088ms · About: xarray-datasette