issue_comments
23 rows where author_association = "MEMBER" and issue = 479942077 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- How should xarray use/support sparse arrays? · 23 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1534724554 | https://github.com/pydata/xarray/issues/3213#issuecomment-1534724554 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85begnK | rabernat 1197350 | 2023-05-04T12:51:59Z | 2023-05-04T12:51:59Z | MEMBER |
Existing sparse testing is here: https://github.com/pydata/xarray/blob/main/xarray/tests/test_sparse.py We would welcome enhancements to this! |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1534238962 | https://github.com/pydata/xarray/issues/3213#issuecomment-1534238962 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85bcqDy | hameerabbasi 2190658 | 2023-05-04T07:47:04Z | 2023-05-04T07:47:04Z | MEMBER | Speaking a bit to things like While that doesn't apply in the case of |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1534001190 | https://github.com/pydata/xarray/issues/3213#issuecomment-1534001190 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85bbwAm | rabernat 1197350 | 2023-05-04T02:36:57Z | 2023-05-04T02:36:57Z | MEMBER | Hi @jdbutler and welcome! We would welcome this sort of contribution eagerly. I would characterize our current support of sparse arrays as really just a proof of concept. When to use sparse and how to do it effectively is not well documented. Simply adding more documentation around the already-supported use cases would be a great place to start IMO. My own exploration of this are described in this Pangeo post. The use case is regridding. It touches on quite a few of the points you're interested in, in particular the integration with geodataframe. Along similar lines, @dcherian has been working on using opt_einsum together with sparse in https://github.com/pangeo-data/xESMF/issues/222#issuecomment-1524041837 and https://github.com/pydata/xarray/issues/7764. I'd also suggest catching up on what @martinfleis is doing with vector data cubes in xvec. (See also Pangeo post on this topic.) Of the three topics you enumerated, I'm most interested in the serialization one. However, I'd rather see serialization of sparse arrays prototyped in Zarr, as its much more conducive to experimentation than NetCDF (which requires writing C to do anything custom). I would recommend exploring serialization from a sparse array in memory to a sparse format on disk via a custom codec. Zarr recently added support for a |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1014383681 | https://github.com/pydata/xarray/issues/3213#issuecomment-1014383681 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X848dkRB | hameerabbasi 2190658 | 2022-01-17T10:48:48Z | 2022-01-17T10:48:48Z | MEMBER | For As for the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
943517731 | https://github.com/pydata/xarray/issues/3213#issuecomment-943517731 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X844PPAj | keewis 14808389 | 2021-10-14T16:25:04Z | 2021-10-14T16:25:04Z | MEMBER | that's mostly an oversight, I think. However, to be really useful we'd need to get a Anyways, the docs you're looking for is working with numpy-like arrays, even though there's no explicit mention of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
615772303 | https://github.com/pydata/xarray/issues/3213#issuecomment-615772303 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYxNTc3MjMwMw== | hameerabbasi 2190658 | 2020-04-18T08:41:39Z | 2020-04-18T08:41:39Z | MEMBER | Hi. Yes, it’d be nice if we had a meta issue I could then open separate issues for for sllearn implementations. Performance is not ideal, and I realise that. However I’m working on a more generic solution to performance as I type. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
615501070 | https://github.com/pydata/xarray/issues/3213#issuecomment-615501070 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYxNTUwMTA3MA== | mrocklin 306380 | 2020-04-17T23:08:18Z | 2020-04-17T23:08:18Z | MEMBER | @amueller have you all connected with @hameerabbasi ? I'm not surprised to hear that there are performance issues with pydata/sparse relative to scipy.sparse, but Hameer has historically been pretty open to working to resolve issues quickly. I'm not sure if there is already an ongoing conversation between the two groups, but I'd recommend replacing "we've chosen not to use pydata/sparse because it isn't feature complete enough for us" with "in order for us to use pydata/sparse we would need the following features". |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
615499609 | https://github.com/pydata/xarray/issues/3213#issuecomment-615499609 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYxNTQ5OTYwOQ== | shoyer 1217238 | 2020-04-17T23:01:15Z | 2020-04-17T23:01:15Z | MEMBER | Wrapping
(2) is the biggest challenge. I don't want to maintain that compatibility layer inside xarray, but if it existed we would be happy to try using it. pydata/sparse solves both these problems, though again indeed it only has quite limited data structures. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
592476821 | https://github.com/pydata/xarray/issues/3213#issuecomment-592476821 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU5MjQ3NjgyMQ== | crusaderky 6213168 | 2020-02-28T11:39:50Z | 2020-02-28T11:39:50Z | MEMBER | xr.apply_ufunc(sparse.COO, ds, dask='parallelized') |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
587564478 | https://github.com/pydata/xarray/issues/3213#issuecomment-587564478 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU4NzU2NDQ3OA== | crusaderky 6213168 | 2020-02-18T16:58:25Z | 2020-02-18T16:58:25Z | MEMBER | you just need to
Regards On Tue, 18 Feb 2020 at 13:56, fmfreeze notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
585997533 | https://github.com/pydata/xarray/issues/3213#issuecomment-585997533 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU4NTk5NzUzMw== | crusaderky 6213168 | 2020-02-13T22:12:37Z | 2020-02-13T22:12:37Z | MEMBER | Hi fmfreeze, > Dask integration enables xarray to scale to big data, only as long as the data has no sparse character. Do you agree on that formulation or am I missing something fundamental? I don't agree. To my understanding xarray->dask->sparse works very well (save bugs), as long as your data density (the percentage of non-default points) is roughly constant across dask chunks. If it isn't, then you'll have some chunks that consume substantially more RAM and CPU to compute than others. This can be mitigated, if you know in advance where you are going to have more samples, by setting uneven dask chunk sizes. For example, if you have a one-dimensional array of 100k points and you know in advance that the density of non-default samples follows a gaussian or triangular distribution, then it may be wise to have very large chunks at the tails and then get them progressively smaller towards the center, e.g. (30k, 12k, 5k, 2k, 1k, 1k, 2k, 5k, 10k, 30k). Of course, there are use cases where you're going to have unpredictable hotspots; I'm afraid that in those the only thing you can do is size your chunks for the worst case and end up oversplitting everywhere else. Regards Guido On Thu, 13 Feb 2020 at 10:55, fmfreeze notifications@github.com wrote:
|
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
551134122 | https://github.com/pydata/xarray/issues/3213#issuecomment-551134122 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU1MTEzNDEyMg== | dcherian 2448579 | 2019-11-07T15:40:12Z | 2019-11-07T15:40:12Z | MEMBER | the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
551125982 | https://github.com/pydata/xarray/issues/3213#issuecomment-551125982 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU1MTEyNTk4Mg== | dcherian 2448579 | 2019-11-07T15:23:56Z | 2019-11-07T15:23:56Z | MEMBER | @El-minadero a lot of that overhead may be fixed on master and more recent xarray versions. https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets has some tips on quickly concatenating / merging datasets. It depends on the datasets you are joining... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
527766483 | https://github.com/pydata/xarray/issues/3213#issuecomment-527766483 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNzc2NjQ4Mw== | crusaderky 6213168 | 2019-09-04T06:46:08Z | 2019-09-04T06:46:08Z | MEMBER | @p-d-moore what you say makes sense but it is well outside of the domain of xarray. What you're describing is basically a new sparse class, substantially more sophisticated than COO, and should be proposed in the sparse board, not here. After it's implemented in sparse, xarray will be able to wrap around it. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526748987 | https://github.com/pydata/xarray/issues/3213#issuecomment-526748987 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjc0ODk4Nw== | shoyer 1217238 | 2019-08-30T21:01:55Z | 2019-08-30T21:01:55Z | MEMBER | You will need to install NumPy 1.17 or set the env variable before importing NumPy. On Fri, Aug 30, 2019 at 1:57 PM firdaus janoos notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526736529 | https://github.com/pydata/xarray/issues/3213#issuecomment-526736529 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjczNjUyOQ== | dcherian 2448579 | 2019-08-30T20:21:28Z | 2019-08-30T20:21:28Z | MEMBER |
Basically you need to install https://sparse.pydata.org/en/latest/ using either pip or conda. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526718101 | https://github.com/pydata/xarray/issues/3213#issuecomment-526718101 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjcxODEwMQ== | shoyer 1217238 | 2019-08-30T19:19:13Z | 2019-08-30T19:19:13Z | MEMBER | We have a new "sparse=True" option in xarray.Dataset.from_dataframe for exactly this use case. Pandas's to_xarray() method just calls this method, so it would make sense to forward keyword arguments, too. On Fri, Aug 30, 2019 at 11:53 AM firdaus janoos notifications@github.com wrote:
|
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521691465 | https://github.com/pydata/xarray/issues/3213#issuecomment-521691465 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTY5MTQ2NQ== | shoyer 1217238 | 2019-08-15T15:50:42Z | 2019-08-15T15:50:42Z | MEMBER | Yes, it would be useful (eventually) to have lazy loading of sparse arrays from disk, like we want we currently do for dense arrays. This would indeed require knowing that the indices are sorted. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521533999 | https://github.com/pydata/xarray/issues/3213#issuecomment-521533999 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTUzMzk5OQ== | shoyer 1217238 | 2019-08-15T06:42:44Z | 2019-08-15T06:42:44Z | MEMBER | I like the indexed ragged array representation because it maps directly into sparse’s COO format. I’m sure other formats would be possible, but they would also likely be harder to implement. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521301555 | https://github.com/pydata/xarray/issues/3213#issuecomment-521301555 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTMwMTU1NQ== | shoyer 1217238 | 2019-08-14T15:42:58Z | 2019-08-14T15:42:58Z | MEMBER | netCDF has a pretty low-level base spec, with conventions left to higher level docs like CF conventions. Fortunately, there does seems to be a CF convention that would be a good fit for for sparse data in COO format, namely the indexed ragged array representation (example, note the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521224538 | https://github.com/pydata/xarray/issues/3213#issuecomment-521224538 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTIyNDUzOA== | crusaderky 6213168 | 2019-08-14T12:25:39Z | 2019-08-14T12:25:39Z | MEMBER | As for NetCDF, instead of a bespoke xarray-only convention, wouldn't it be much better to push a spec extension upstream? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521223609 | https://github.com/pydata/xarray/issues/3213#issuecomment-521223609 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTIyMzYwOQ== | crusaderky 6213168 | 2019-08-14T12:22:37Z | 2019-08-14T12:22:37Z | MEMBER | As already mentioned in #3206, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521221473 | https://github.com/pydata/xarray/issues/3213#issuecomment-521221473 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTIyMTQ3Mw== | crusaderky 6213168 | 2019-08-14T12:15:39Z | 2019-08-14T12:20:59Z | MEMBER | +1 for the introduction of to_sparse() / to_dense(), but let's please avoid the mistakes that were done with chunk(). DataArray.chunk() is extremely frustrating when you have non-index coords and, 9 times out of 10, you only want to chunk the data and you have to go through the horrid
Possibly we could define them as ```python class DataArray: def to_sparse( self, data: bool = True, coords: Union[Iterable[Hashable], bool] = False ) class Dataset: def to_sparse( self, data_vars: Union[Iterable[Hashable], bool] = True, coords: Union[Iterable[Hashable], bool] = False ) ``` same for to_dense() and chunk() (the latter would require a DeprecationWarning for a few release before switching the default for coords from True to False - only to be triggered in presence of dask-backed coords). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 7