home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 406812274 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer 1

issue 1

  • reindex doesn't preserve chunks · 1 ✖

author_association 1

  • MEMBER 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
460694818 https://github.com/pydata/xarray/issues/2745#issuecomment-460694818 https://api.github.com/repos/pydata/xarray/issues/2745 MDEyOklzc3VlQ29tbWVudDQ2MDY5NDgxOA== shoyer 1217238 2019-02-05T16:03:40Z 2019-02-05T16:03:40Z MEMBER

To understand what's going on here, it may be helpful to look at what's going on inside dask: ``` In [16]: x = np.arange(5)

In [17]: da = xr.DataArray(np.ones(5), coords=[('x', x)]).chunk(-1)

In [18]: da Out[18]: <xarray.DataArray (x: 5)> dask.array<shape=(5,), dtype=float64, chunksize=(5,)> Coordinates: * x (x) int64 0 1 2 3 4

In [19]: da.reindex({'x': np.arange(20)}) Out[19]: <xarray.DataArray (x: 20)> dask.array<shape=(20,), dtype=float64, chunksize=(20,)> Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

In [20]: da.reindex({'x': np.arange(20)}).data.dask Out[20]: <dask.sharedict.ShareDict at 0x3201d72e8>

In [21]: dict(da.reindex({'x': np.arange(20)}).data.dask) Out[21]: {('where-8e0018fae0773d202c09fde132189347', 0): (subgraph_callable, ('eq-8167293bb8136be2934a8bf111095d8f', 0), array(nan), ('getitem-0eab360ba0dee5a5c3fbded0fdfd70e3', 0)), ('eq-8167293bb8136be2934a8bf111095d8f', 0): (subgraph_callable, ('array-5ddc8bae2e6cf87c0bac846c6da4d27f', 0), -1), ('array-5ddc8bae2e6cf87c0bac846c6da4d27f', 0): array([ 0, 1, 2, 3, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]), ('xarray-<this-array>-765734894ab8f05a57335ea1064da549', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'xarray-<this-array>-765734894ab8f05a57335ea1064da549', (slice(0, 5, None),)), 'xarray-<this-array>-765734894ab8f05a57335ea1064da549': ImplicitToExplicitIndexingAdapter(array=NumpyIndexingAdapter(array=array([1., 1., 1., 1., 1.]))), ('getitem-0eab360ba0dee5a5c3fbded0fdfd70e3', 0): (<function _operator.getitem(a, b, /)>, ('xarray-<this-array>-765734894ab8f05a57335ea1064da549', 0), (array([0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]),))} ```

Xarary isn't controlling chunk sizes directly, but it's turns reindex({'x': x2}) into an indexing operation like data[np.array([0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])] under the covers, with the last element repeated -- in your case 100000 times. See xarray.Variable._getitem_with_mask for the implementation on the xarray side.

The alternative design would be to append an array of all NaNs along one axis, but on average I think the current implementation is faster and results in more contiguous chunks -- it's quite common to intersperse missing indices with reindex() and alternating indexed/missing values can result in tiny chunks. Even then I think you would probably run into performance issues -- I don't think dask.array.full() uses a default chunk size.

We could also conceivably put some heuristics to control chunking for this in xarray, but I'd rather do it upstream in dask.array, if possible (xarray tries to avoid thinking about chunks).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reindex doesn't preserve chunks 406812274

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.883ms · About: xarray-datasette