home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 834972299 and user = 703554 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • alimanfoo · 2 ✖

issue 1

  • Fancy indexing a Dataset with dask DataArray causes excessive memory usage · 2 ✖

author_association 1

  • CONTRIBUTOR 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
802101178 https://github.com/pydata/xarray/issues/5054#issuecomment-802101178 https://api.github.com/repos/pydata/xarray/issues/5054 MDEyOklzc3VlQ29tbWVudDgwMjEwMTE3OA== alimanfoo 703554 2021-03-18T16:45:51Z 2021-03-18T16:58:44Z CONTRIBUTOR

FWIW my use case actually only needs indexing a single dimension, i.e., something equivalent to the numpy (or dask.array) compress function. This can be hacked for xarray datasets in a fairly straightforward way:

```python def _compress_dataarray(a, indexer, dim): data = a.data try: axis = a.dims.index(dim) except ValueError: v = data else: # rely on array_function to handle dispatching to dask if # data is a dask array v = np.compress(indexer, a.data, axis=axis) if hasattr(v, 'compute_chunk_sizes'): # needed to know dim lengths v.compute_chunk_sizes() return v

def compress_dataset(ds, indexer, dim): if isinstance(indexer, str): indexer = ds[indexer].data

coords = dict()
for k in ds.coords:
    a = ds[k]
    v = _compress_dataarray(a, indexer, dim)
    coords[k] = (a.dims, v)

data_vars = dict()
for k in ds.data_vars:
    a = ds[k]
    v = _compress_dataarray(a, indexer, dim)
    data_vars[k] = (a.dims, v)

attrs = ds.attrs.copy()

return xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)

```

Given the complexity of fancy indexing in general, I wonder if it's worth contemplating implementing a Dataset.compress() method as a first step.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299
802096873 https://github.com/pydata/xarray/issues/5054#issuecomment-802096873 https://api.github.com/repos/pydata/xarray/issues/5054 MDEyOklzc3VlQ29tbWVudDgwMjA5Njg3Mw== alimanfoo 703554 2021-03-18T16:39:59Z 2021-03-18T16:39:59Z CONTRIBUTOR

Thanks @dcherian.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.948ms · About: xarray-datasette