home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

25 rows where issue = 221858543 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 11

  • rabernat 6
  • shoyer 5
  • mrocklin 3
  • Hoeze 3
  • fjanoos 2
  • rgommers 1
  • rth 1
  • olgabot 1
  • dcherian 1
  • benbovy 1
  • lbybee 1

author_association 3

  • MEMBER 16
  • NONE 8
  • CONTRIBUTOR 1

issue 1

  • Sparse arrays · 25 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
526432439 https://github.com/pydata/xarray/issues/1375#issuecomment-526432439 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUyNjQzMjQzOQ== dcherian 2448579 2019-08-30T02:36:12Z 2019-08-30T02:36:12Z MEMBER

@fjanoos there isn't any formal documentation yet but you can look at test_sparse.py for examples. That file will also tell you what works and doesn't work currently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
526356476 https://github.com/pydata/xarray/issues/1375#issuecomment-526356476 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUyNjM1NjQ3Ng== fjanoos 923438 2019-08-29T20:52:10Z 2019-08-29T20:52:10Z NONE

@shoyer Is there documentation for using sparse arrays ? Could you point me to some example code ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
520675205 https://github.com/pydata/xarray/issues/1375#issuecomment-520675205 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUyMDY3NTIwNQ== shoyer 1217238 2019-08-13T03:31:14Z 2019-08-13T03:31:14Z MEMBER

This is working now on the master branch!

Once we get a few more kinks worked out, it will be in the next release.

I've started another issue for discussing how xarray could integrate sparse arrays better into its API: https://github.com/pydata/xarray/issues/3213

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
513589352 https://github.com/pydata/xarray/issues/1375#issuecomment-513589352 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMzU4OTM1Mg== fjanoos 923438 2019-07-21T21:32:23Z 2019-07-21T21:32:23Z NONE

Wondering what the status on this is ? Is there a branch with this functionality implemented - would love to give it a spin !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
511209094 https://github.com/pydata/xarray/issues/1375#issuecomment-511209094 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMTIwOTA5NA== mrocklin 306380 2019-07-14T14:50:45Z 2019-07-14T14:50:45Z MEMBER

@nvictus has been working on this at #3117

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
511127437 https://github.com/pydata/xarray/issues/1375#issuecomment-511127437 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMTEyNzQzNw== rabernat 1197350 2019-07-13T14:45:17Z 2019-07-13T14:45:17Z MEMBER

I personally use the new sparse project for my day-to-day research. I am motivated on this, but I probably won't have time today to dive deep on this.

Maybe CuPy would be more exciting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
511121578 https://github.com/pydata/xarray/issues/1375#issuecomment-511121578 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMTEyMTU3OA== rgommers 98330 2019-07-13T13:18:34Z 2019-07-13T13:18:34Z NONE

I haven't talked to anyone at SciPy'19 yet who was interested in sparse arrays, but I'll keep an eye out today.

And yes, this is a fun issue to work on and would be really nice to have!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
510943157 https://github.com/pydata/xarray/issues/1375#issuecomment-510943157 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMDk0MzE1Nw== mrocklin 306380 2019-07-12T16:07:42Z 2019-07-12T16:07:42Z MEMBER

@rgommers might be able to recommend someone

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
510940851 https://github.com/pydata/xarray/issues/1375#issuecomment-510940851 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMDk0MDg1MQ== rabernat 1197350 2019-07-12T16:00:23Z 2019-07-12T16:00:23Z MEMBER

If someone who is good at numpy shows up at our sprint tomorrow, this could be a good issue try out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
504777412 https://github.com/pydata/xarray/issues/1375#issuecomment-504777412 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUwNDc3NzQxMg== shoyer 1217238 2019-06-23T18:54:33Z 2019-06-23T18:54:33Z MEMBER

It will need some experimentation, but I think things should be pretty close after NumPy 1.17 is released. Potentially it could be as easy as adjusting the rules xarray uses for casting in xarray.core.variable.as_compatible_data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
504620907 https://github.com/pydata/xarray/issues/1375#issuecomment-504620907 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUwNDYyMDkwNw== rabernat 1197350 2019-06-22T02:55:17Z 2019-06-22T02:55:17Z MEMBER

Given the recent improvements in numpy duck array typing, how close are we to being able to just wrap a pydata/sparse array in an xarray Dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
403155235 https://github.com/pydata/xarray/issues/1375#issuecomment-403155235 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDQwMzE1NTIzNQ== shoyer 1217238 2018-07-06T21:49:27Z 2018-07-06T21:49:27Z MEMBER

Would it be an option to use dask's sparse support? http://dask.pydata.org/en/latest/array-sparse.html This way xarray could let dask do the main work.

In principle this would work, though I would prefer to support it directly in xarray, too.

I know that NetCDF4 has some conventions how to store sparse data, but do we have to implement our own conversion mechanisms for each sparse type?

Yes, we would need to implement a convention for handling sparse array data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
402699810 https://github.com/pydata/xarray/issues/1375#issuecomment-402699810 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDQwMjY5OTgxMA== Hoeze 1200058 2018-07-05T12:02:30Z 2018-07-05T12:02:30Z NONE

How should these sparse arrays get stored in NetCDF4? I know that NetCDF4 has some conventions how to store sparse data, but do we have to implement our own conversion mechanisms for each sparse type?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
402699290 https://github.com/pydata/xarray/issues/1375#issuecomment-402699290 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDQwMjY5OTI5MA== Hoeze 1200058 2018-07-05T12:00:15Z 2018-07-05T12:00:15Z NONE

Would it be an option to use dask's sparse support? http://dask.pydata.org/en/latest/array-sparse.html This way xarray could let dask do the main work.

Currently I load everything into a dask array by hand and pass this dask array to xarray. This works pretty good.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
395223735 https://github.com/pydata/xarray/issues/1375#issuecomment-395223735 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDM5NTIyMzczNQ== shoyer 1217238 2018-06-06T21:43:40Z 2018-06-06T21:43:40Z MEMBER

See also: https://github.com/pydata/xarray/issues/1938

The major challenge now is the dispatching mechanism, which hopefully http://www.numpy.org/neps/nep-0018-array-function-protocol.html will solve.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
395009307 https://github.com/pydata/xarray/issues/1375#issuecomment-395009307 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDM5NTAwOTMwNw== Hoeze 1200058 2018-06-06T09:39:43Z 2018-06-06T09:41:28Z NONE

I'd know a project which could make perfect use of xarray, if it would support sparse tensors: https://github.com/theislab/anndata

Currently I have to work with both xarray and anndata to store counts in sparse arrays separate from other depending data which is a little bit annoying :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
355383374 https://github.com/pydata/xarray/issues/1375#issuecomment-355383374 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDM1NTM4MzM3NA== lbybee 4998171 2018-01-04T19:59:28Z 2018-01-04T19:59:28Z NONE

I'm interested to see if there have been any developments on this. I currently have an application where I'm working with multiple dask arrays, some of which are sparse (text data). It'd be worth my time to move my project to xarray, so I'm be interested in contributing something here if there is a need.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
326824818 https://github.com/pydata/xarray/issues/1375#issuecomment-326824818 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDMyNjgyNDgxOA== rabernat 1197350 2017-09-03T19:07:54Z 2017-09-03T19:07:54Z MEMBER

Sparse Xarray DataArrays would be useful for the linear regridding operations discussed in JiaweiZhuang/xESMF#3.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
326803603 https://github.com/pydata/xarray/issues/1375#issuecomment-326803603 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDMyNjgwMzYwMw== rth 630936 2017-09-03T13:01:44Z 2017-09-03T13:01:44Z CONTRIBUTOR

do you have an application that we could use to drive this?

Other examples where labeled sparse arrays would be useful are, * one-hot encoding that are widely used in machine learning. * tokenizing textual data produces large sparse matrices where the column labels correspond to the vocabulary, while row labels correspond to document ids. Here is a minimal example using scikit-learn, ```py import os.path from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer

 ds = fetch_20newsgroups()
 vect = CountVectorizer()
 X = vect.fit_transform(ds.data)
 print(X)  # Extracted tokens
 # Returns:
 # <11314x130107 sparse matrix of type '<class 'numpy.int64'>'
 #  with 1787565 stored elements in Compressed Sparse Row format>

 column_labels = vect.get_feature_names()
 print(np.asarray(column_labels))
 # Returns:
 # array(['00', '000', '0000', ..., 'íålittin', 'ñaustin', 'ýé'],   dtype='<U180')

 row_labels = [int(os.path.split(el)[1]) for el in ds.filenames]
 print(np.asarray(row_labels))
 # Returns:
 # array([102994,  51861,  51879, ...,  60695,  38319, 104440])
 ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
311118338 https://github.com/pydata/xarray/issues/1375#issuecomment-311118338 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDMxMTExODMzOA== olgabot 806256 2017-06-26T16:55:08Z 2017-06-26T16:55:08Z NONE

In case you're still looking for an application, gene expression from single cells (see data/00_original/GSM162679$i_P14Retina_$j.digital_expression.txt.gz) is very sparse due to high gene dropout. The shape is expression.shape (49300, 24760) and it's mostly zeros or nans. A plain csv from this data was 2.5 gigs, which gzipped to 300 megs.

Here is an example of using xarray to combine these files but my kernel keeps dying when I do ds.to_netcdf() :(

Hope this is a good example for sparse arrays!

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
294386024 https://github.com/pydata/xarray/issues/1375#issuecomment-294386024 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDI5NDM4NjAyNA== rabernat 1197350 2017-04-17T01:18:15Z 2017-04-17T01:18:25Z MEMBER

@rabernat do you have an application that we could use to drive this?

Nothing comes to mind immediately. My data are unfortunately quite dense! 😜

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
294381137 https://github.com/pydata/xarray/issues/1375#issuecomment-294381137 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDI5NDM4MTEzNw== mrocklin 306380 2017-04-16T23:50:26Z 2017-04-16T23:50:26Z MEMBER

Here is a brief attempt at a multi-dimensional sparse array: https://github.com/mrocklin/sparse

It depends on numpy and scipy.sparse and, with the exception of a bit of in-memory data movement and copies, should run at scipy speeds (though I haven't done any benchmarking).

@rabernat do you have an application that we could use to drive this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
294283051 https://github.com/pydata/xarray/issues/1375#issuecomment-294283051 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDI5NDI4MzA1MQ== benbovy 4160723 2017-04-15T09:42:20Z 2017-04-15T09:42:20Z MEMBER

Although I don't know much about SciDB, it seems to be another possible application for xarray.register_data_type.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
294270200 https://github.com/pydata/xarray/issues/1375#issuecomment-294270200 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDI5NDI3MDIwMA== rabernat 1197350 2017-04-15T03:56:27Z 2017-04-15T03:56:52Z MEMBER

👍 to the scipy.sparse array suggestion

[While we are discussing supporting other array types, we should keep gpu arrays on the radar]

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
294250748 https://github.com/pydata/xarray/issues/1375#issuecomment-294250748 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDI5NDI1MDc0OA== shoyer 1217238 2017-04-14T22:46:10Z 2017-04-14T22:47:01Z MEMBER

Yes, I would say this is in scope, as long as we can keep most of the data-type specific logic out of xarray's core (which seems doable).

Currently, we define most of our operations on duck arrays in https://github.com/pydata/xarray/blob/master/xarray/core/duck_array_ops.py

There are a few other hacks throughout the codebase, which can find by searching for "dask_array_type": https://github.com/pydata/xarray/search?p=1&q=dask_array_type&type=&utf8=%E2%9C%93

It's pretty crude, but basically this would need to be extended to implement many of these methods on for sparse arrays, too. Ideally we would define xarray's adapter logic into more cleanly separated submodules, perhaps using multiple dispatch. Even better, we would make this public API, so you can write something like xarray.register_data_type(MySparseArray) to register a type as valid for xarray's .data attribute.

It looks like __array_ufunc__ will actually finally land in NumPy 1.13, which might make this easier.

See also https://github.com/pydata/xarray/pull/1118

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.258ms · About: xarray-datasette