home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 517338735 and user = 10554254 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • friedrichknuth · 1 ✖

issue 1

  • Need documentation on sparse / cupy integration · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
549627590 https://github.com/pydata/xarray/issues/3484#issuecomment-549627590 https://api.github.com/repos/pydata/xarray/issues/3484 MDEyOklzc3VlQ29tbWVudDU0OTYyNzU5MA== friedrichknuth 10554254 2019-11-05T01:50:29Z 2020-02-12T02:51:51Z NONE

After reading through the issue tracker and PRs, it looks like sparse arrays can safely be wrapped with xarray, thanks to the work done in PR#3117, but built-in functions are still under development (e.g. PR#3542). As a user, here is what I am seeing when test driving sparse:

Sparse gives me a smaller in-memory array

```python In [1]: import xarray as xr, sparse, sys, numpy as np, dask.array as da

In [2]: x = np.random.random((100, 100, 100))

In [3]: x[x < 0.9] = np.nan

In [4]: s = sparse.COO.from_numpy(x, fill_value=np.nan)

In [5]: sys.getsizeof(s) Out[5]: 3189592

In [6]: sys.getsizeof(x) Out[6]: 8000128 ``` Which I can wrap with dask and xarray

```python In [7]: x = da.from_array(x)

In [8]: s = da.from_array(s)

In [9]: ds_dense = xr.DataArray(x).to_dataset(name='data_variable')

In [10]: ds_sparse = xr.DataArray(s).to_dataset(name='data_variable')

In [11]: ds_dense Out[11]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=np.ndarray>

In [12]: ds_sparse Out[12]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=sparse.COO> ``` However, computation on a sparse array takes longer than running compute on a dense array (which I think is expected...?)

```python In [13]: %%time ...: ds_sparse.mean().compute() CPU times: user 487 ms, sys: 22.9 ms, total: 510 ms Wall time: 518 ms Out[13]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501

In [14]: %%time ...: ds_dense.mean().compute() CPU times: user 10.9 ms, sys: 3.91 ms, total: 14.8 ms Wall time: 13.8 ms Out[14]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501 ```

And writing to netcdf, to take advantage of the smaller data size, doesn't work out of the box (yet)

python In [15]: ds_sparse.to_netcdf('ds_sparse.nc') Out[15]: ... RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

Additional discussion happening at #3213

@dcherian @shoyer Am I missing any built-in methods that are working and ready for public release? Happy to send in a PR, if any of what is provided here should go into a basic example for the docs.

At this stage, I am not using sparse arrays for my own research just yet, but when I get to that anticipated phase I can dig in more on this and hopefully send in some useful PRs for improved documentation and fixes/features.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Need documentation on sparse / cupy integration 517338735

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 411.117ms · About: xarray-datasette