issue_comments
30 rows where author_association = "NONE" and issue = 479942077 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- How should xarray use/support sparse arrays? · 30 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1544952425 | https://github.com/pydata/xarray/issues/3213#issuecomment-1544952425 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85cFhpp | jbbutler 41593244 | 2023-05-12T01:01:21Z | 2023-05-12T01:01:21Z | NONE | Thank you all so much for the feedback and resources! I agree (1) testing the limits of xArray's API compatibility with sparse and (2) developing some documentation for what is/isn't supported are great places to start, so I'll get on that while I think about the other I/O issues (serialization, etc.) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1534695467 | https://github.com/pydata/xarray/issues/3213#issuecomment-1534695467 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85beZgr | khaeru 1634164 | 2023-05-04T12:31:22Z | 2023-05-04T12:31:22Z | NONE | That's a totally valid scope limitation for the sparse package, and I understand the motivation. I'm just saying that the principle of least astonishment is not being followed: the user cannot at the moment read either the xarray or sparse docs and know which portions of the xarray API will work when giving |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1534231523 | https://github.com/pydata/xarray/issues/3213#issuecomment-1534231523 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85bcoPj | khaeru 1634164 | 2023-05-04T07:40:26Z | 2023-05-04T07:40:26Z | NONE | @jbbutler please also see this comment et seq. https://github.com/pydata/sparse/issues/1#issuecomment-792342987 and related pydata/sparse#438. To add to @rabernat's point about sparse support being "not well documented", I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly tested. I expected to be able to use e.g. IMHO, I/O to/from sparse-backed objects is less valuable if only a small subset of xarray functionality is available on those objects. Perhaps explicitly testing/confirming which parts of the API do/do not currently work with sparse would support the improvements to the docs that Ryan mentioned, and reveal the work remaining to provide full(er) support. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1533842816 | https://github.com/pydata/xarray/issues/3213#issuecomment-1533842816 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X85bbJWA | jbbutler 41593244 | 2023-05-03T22:40:32Z | 2023-05-03T22:40:32Z | NONE | Hi all! As part of a research project, I'm looking to contribute to xArray's sparse capabilities, with an emphasis on sparse support for use-cases in the geosciences. I'm wondering if anyone in the geosciences (or adjacent disciplines!) has encountered problems with xArray's current level of sparse support, and what kinds of improvements they'd like to see to address those issues. From playing around, it seems the current strategy of backing DataArrays with COO sparse arrays takes care of a lot of use cases, but I have the following ideas that may (or may not) be useful to implement further:
I'd appreciate any feedback on these ideas, as well as any other things that would be nice to have implemented! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1014462537 | https://github.com/pydata/xarray/issues/3213#issuecomment-1014462537 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X848d3hJ | Material-Scientist 40465719 | 2022-01-17T12:20:18Z | 2022-01-17T12:20:18Z | NONE | I know. But having sparse data I can treat as if it were dense allows me to unstack without running out of memory, and then ffill & downsample the data in chunks: It would be nice if xarray automatically converted the data from sparse back to dense for doing operations on the chunks just like pandas does. The picture shows that I'm already using nbytes to determine the size. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
1013887301 | https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X848brFF | Material-Scientist 40465719 | 2022-01-16T14:35:29Z | 2022-01-16T14:40:13Z | NONE | I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory. Look at the following example with pandas multiindex & sparse dtype:
The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory! And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:
Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):
So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data. In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays. cc: @stavrospapadopoulos |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
634558423 | https://github.com/pydata/xarray/issues/3213#issuecomment-634558423 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYzNDU1ODQyMw== | SimonHeybrock 12912489 | 2020-05-27T10:00:25Z | 2021-10-15T04:38:25Z | NONE | @pnsaevik If the approach we adopt in scipp could be ported to xarray you would be able to to something like (assuming that the ragged array representation you have in mind is "list of lists"): ```python data = my_load_netcdf(...) # list of lists assume 'x' is the dimension of the nested listsbin_edges = sc.Variable(dims=['x'], values=[0.1,0.3,0.5,0.7,0.9]) realigned = sc.realign(data, {'x':bin_edges}) filtered = realigned['x', 1:3].copy() my_store_netcdf(filtered.unaligned, ...) ``` Basically, we have slicing for the "realigned" wrapper. It performs a filter operation when copied. Edit 2021: Above example is very outdated, we have cleaned up the mechanism, see https://scipp.github.io/user-guide/binned-data/binned-data.html. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
632536798 | https://github.com/pydata/xarray/issues/3213#issuecomment-632536798 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYzMjUzNjc5OA== | SimonHeybrock 12912489 | 2020-05-22T07:20:35Z | 2021-10-15T04:36:17Z | NONE | I am not familiar with the details of the various applications people in this discussion have, but here is an approach we are taking, trying to solve variations of the problem "data scattered in multi-dimensional space" or irregular time-series data. See https://scipp.github.io/user-guide/binned-data/binned-data.html for an illustrated description. The basic idea is to keep data in a linear representation and wrap it in a "realigned" wrapper. One reason for this development was to provide a pathway to use dask with our type of data (independent time series at a large number of points in space, with chunking along the "time-series", which is not a dimension since every time series has a different length). With the linked approach we could use dask to distribute the linear underlying representation, keeping the lightweight realigned wrapper on all workers. We are still in early experimentation with this (the dask part is not actually in development yet). It probably has performance issues if more than "millions" of points are realigned --- our case is millions of time series with thousands/millions of time points in each, but the two do not mix (not both are realigned, and if they are it is independently), so we do not run into the performance issue in most cases. In principle I could imagine this non-destructive realignment approach could be mapped to xarray, so it may be of interest to people here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
943518935 | https://github.com/pydata/xarray/issues/3213#issuecomment-943518935 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X844PPTX | scottgigante-immunai 84813314 | 2021-10-14T16:26:21Z | 2021-10-14T16:26:21Z | NONE | Thanks so much! Appreciate it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
943504365 | https://github.com/pydata/xarray/issues/3213#issuecomment-943504365 | https://api.github.com/repos/pydata/xarray/issues/3213 | IC_kwDOAMm_X844PLvt | scottgigante-immunai 84813314 | 2021-10-14T16:10:10Z | 2021-10-14T16:10:10Z | NONE | According to test_sparse.py it looks like XArray already supports sparse, even though the XArray docs doesn't mention this support. Can we expect |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
634551055 | https://github.com/pydata/xarray/issues/3213#issuecomment-634551055 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYzNDU1MTA1NQ== | pnsaevik 12728107 | 2020-05-27T09:44:55Z | 2020-05-27T09:44:55Z | NONE | Thanks for looking into sparse arrays for xarray. I have a use case I believe would be common:
At least I would love such a functionality... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
615500990 | https://github.com/pydata/xarray/issues/3213#issuecomment-615500990 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYxNTUwMDk5MA== | amueller 449558 | 2020-04-17T23:07:57Z | 2020-04-17T23:07:57Z | NONE | @shoyer thanks! Mostly spitballing here, but it's interesting to know that 2) would be the bigger problem in your opinion, I had assumed 1) would be the main issue. That raises the question whether it's easier to wrap |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
615497160 | https://github.com/pydata/xarray/issues/3213#issuecomment-615497160 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDYxNTQ5NzE2MA== | amueller 449558 | 2020-04-17T22:51:09Z | 2020-04-17T22:51:09Z | NONE | Small comment from #3981: sklearn has just started running benchmarks, but it looks like pydata/sparse is not feature complete enough for us to use. We might be interested in having scipy.sparse support in xarray. There are two problems with scipy.sparse for us as far as I can see (this is very preliminary): it only has COO, which is not good for us, and ideally we'd want to avoid memory copies whenever we want to use xarray, and I think going from scipy.sparse to pydata/sparse will involve memory copies, even if pydata/sparse adds other formats. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
597825416 | https://github.com/pydata/xarray/issues/3213#issuecomment-597825416 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU5NzgyNTQxNg== | fmfreeze 18172466 | 2020-03-11T19:29:31Z | 2020-03-11T19:29:31Z | NONE | Concatenating multiple lazy, differently sized xr.DataArrays - each wrapping a sparse.COO by xr.apply_ufunc(sparse.COO, ds, dask='parallelized') as @crusaderky suggested - results again in an xr.DataArray, whose wrapped dask array chunks are mapped to numpy arrays:
But also when mapping the resulting, concatenated DataArray to sparse.COO afterwards, my main goal - scalable serialization of a lazy xarray - cannot be achieved. So one suggestion to @shoyer original question: It would be great, if sparse, but still lazy DataArrays/Datasets could be serialized without the data-overhead itself. Currently, that seems to work only for DataArrays which are merged/aligned by DataArrays of the same shape. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
591388766 | https://github.com/pydata/xarray/issues/3213#issuecomment-591388766 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU5MTM4ODc2Ng== | fmfreeze 18172466 | 2020-02-26T11:54:40Z | 2020-02-26T11:54:40Z | NONE | Thank you @crusaderky, unfortunately some obstacles appeared using your loading technique. As thousands of .h5 files are the datasource for my use case and they have various - and sometimes different paths to - datasets, using the xarray.open_mfdatasets(...) function seems not to be possible straight forward. But: 1) I have a routine merging all .h5 datasets into corresponding dask arrays, wrapping dense numpy arrays implicitly 2) I "manually" slice out a part of the the huge lazy dask array and wrap that into an xarray.DataArray/Dataset 3) But applying xr.apply_ufunc(sparse.COO, ds, dask='allowed') on that slice then results in an NotImplementedError: Format not supported for conversion. Supplied type is <class 'dask.array.core.Array'>, see help(sparse.as_coo) for supported formats. (I am not sure, if this is the right place to discuss, so I would be thankful for a response on SO in that case: https://stackoverflow.com/questions/60117268/how-to-make-use-of-xarrays-sparse-functionality-when-combining-differently-size) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
587471646 | https://github.com/pydata/xarray/issues/3213#issuecomment-587471646 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU4NzQ3MTY0Ng== | fmfreeze 18172466 | 2020-02-18T13:56:09Z | 2020-02-18T13:56:51Z | NONE | Thank you @crusaderky for your input. I understand and agree with your statements for sparse data files. My approach is different, because within my (hdf5) data files on disc, I have no sparse datasets at all. But as I combine two differently sampled xarray dataset (initialized by h5py > dask > xarray) with xarrays built-in top-level function "xarray.merge()" (resp. xarray.combine_by_coords()), the resulting dataset is sparse. Generally that is nice behaviour, because two differently sampled datasets get aligned along a coordinate/dimension, and the gaps are filled by NaNs. Nevertheless, those NaN "gaps" seem to need memory for every single NaN. That is what should be avoided. Maybe by implementing a redundant pointer to the same memory adress for each NaN? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
585668294 | https://github.com/pydata/xarray/issues/3213#issuecomment-585668294 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU4NTY2ODI5NA== | fmfreeze 18172466 | 2020-02-13T10:55:15Z | 2020-02-13T10:55:15Z | NONE | Thank you all for making xarray and its tight development with dask so great! As @shoyer mentioned
I am wondering, if creating a lazy & sparse xarray Dataset/DataArray is already possible? Especially when creating the sparse part at runtime, and loading only the data part: Assume two differently sampled - and lazy dask - DataArrays are merged/combined along a coordinate axis into a Dataset. Then the smaller (= less dense) DataVariable is filled with NaNs. As far as I experienced the current behaviour is, that each NaN value requires memory. That issue might be formulated this way: Dask integration enables xarray to scale to big data, only as long as the data has no sparse character. Do you agree on that formulation or am I missing something fundamental? A code example reproducing that issue is described here: https://stackoverflow.com/q/60117268/9657367 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
551132924 | https://github.com/pydata/xarray/issues/3213#issuecomment-551132924 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU1MTEzMjkyNA== | k-a-mendoza 4605410 | 2019-11-07T15:37:21Z | 2019-11-07T15:37:21Z | NONE | @dcherian These examples seem focused on merging from disk, whereas the use-case I'm running into is joining data produced by computation in ram. I'll try updating my xarray installation and see where that gets me. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
551090042 | https://github.com/pydata/xarray/issues/3213#issuecomment-551090042 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU1MTA5MDA0Mg== | k-a-mendoza 4605410 | 2019-11-07T13:57:46Z | 2019-11-07T13:57:46Z | NONE | @oliverhiggs Ive also noticed a huge computational overhead when joining xarray datasets where the result would be sparse. Something like a minute of computation time to join two 10GB datasets, even when there are no overlapping indices. I'm not sure if a sparse representation would help but its possible we'd get a reduced memory footprint and a faster merge/concat time with this kind of support. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
550966385 | https://github.com/pydata/xarray/issues/3213#issuecomment-550966385 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU1MDk2NjM4NQ== | oliverhiggs 5311739 | 2019-11-07T07:57:17Z | 2019-11-07T07:57:17Z | NONE | Thanks for rolling out support for sparse arrays! I think it would be great to have a As an example, I have a use case where I want to concatenate (across a new dimension) a number of DataArrays with date indexes covering different date ranges. When I use |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
546058673 | https://github.com/pydata/xarray/issues/3213#issuecomment-546058673 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDU0NjA1ODY3Mw== | k-a-mendoza 4605410 | 2019-10-24T19:05:23Z | 2019-10-24T19:05:23Z | NONE | So how would one change an existing dataset or dataarray to using a sparse representation?
something like
|
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
527771975 | https://github.com/pydata/xarray/issues/3213#issuecomment-527771975 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNzc3MTk3NQ== | p-d-moore 47371188 | 2019-09-04T07:05:37Z | 2019-09-04T07:05:37Z | NONE | Thanks @crusaderky, appreciated. Might as as well suggest it there. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
527762609 | https://github.com/pydata/xarray/issues/3213#issuecomment-527762609 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNzc2MjYwOQ== | p-d-moore 47371188 | 2019-09-04T06:32:21Z | 2019-09-04T06:32:21Z | NONE | I would like to add a request for sparse xarrays: Support ffill and bfill operations along ordered dimensions (such as datetime coordinates) while maintaining the sparse level of data density. The challenge to overcome is that performing ffill operations on sparse data quickly creates data that is no longer "sparse" in practice and makes dealing with the data challenging. My suggested implementation (and the way I have previously done this in another programming environment) is to represent the data as rows of contiguous regions with a single (non-sparse) value rather than rows of single points. The contiguous dimensions could be defined as any dimensions that are "ordered" such as datetime coordinates. That is, the data then is represented as a list of values + coordinate ranges rather than a list of values + coordinates. The idea is that you can easily compute operations like ffill without changing the sparsity of the matrix, and thus support typical aggregating functions you might like to apply to the data before you collapse the data and convert to a non-sparse form (e.g. perform a lag difference of the most recent value with the most recent value 20 days ago, or do a cross-sectional mean on the data along a certain dimension, using the most recent data at each given point in time). These types of operations can be more useful when the data is "fuller" such as after a forward fill, but often not useful when the data is very sparsely populated (as the cross-sectional operations are unlikely to hit the sparse data among the different dimensions). Care must be taken to avoid "collisions" between sparse blocks of data, that is, avoiding that the list of sparse blocks accidentally overlap. The implementation can get tricky but I believe the goal to be worthwhile. I am happy to expand on the request if the idea is not well expressed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526747770 | https://github.com/pydata/xarray/issues/3213#issuecomment-526747770 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjc0Nzc3MA== | fjanoos 923438 | 2019-08-30T20:57:54Z | 2019-08-30T20:57:54Z | NONE | Thanks. That solved that error but introduced another one. Specifically - this is my dataframe
and this is the error that I get with
My numpy version is definitely about 1.16
I also set this
Furthermore, I don't get this error when I don't set |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526733257 | https://github.com/pydata/xarray/issues/3213#issuecomment-526733257 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjczMzI1Nw== | fjanoos 923438 | 2019-08-30T20:10:43Z | 2019-08-30T20:10:43Z | NONE | I cloned the master branch and installed it using 'python setup.py develop'. When I try to use the sparse data loading functionality as per
```ModuleNotFoundError Traceback (most recent call last) <ipython-input-9-fce0ca6bc4c2> in <module> ----> 1 oo = xa.Dataset.from_dataframe( poly_df.iloc[:10000], sparse=True ) /mnt/local/xarray/xarray/core/dataset.py in from_dataframe(cls, dataframe, sparse) 4040 4041 if sparse: -> 4042 obj._set_sparse_data_from_dataframe(dataframe, dims, shape) 4043 else: 4044 obj._set_numpy_data_from_dataframe(dataframe, dims, shape) /mnt/local/xarray/xarray/core/dataset.py in _set_sparse_data_from_dataframe(self, dataframe, dims, shape) 3936 self, dataframe: pd.DataFrame, dims: tuple, shape: Tuple[int, ...] 3937 ) -> None: -> 3938 from sparse import COO 3939 3940 idx = dataframe.index ModuleNotFoundError: No module named 'sparse' ``` Any suggestions on what I need to do ? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
526710709 | https://github.com/pydata/xarray/issues/3213#issuecomment-526710709 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNjcxMDcwOQ== | fjanoos 923438 | 2019-08-30T18:53:44Z | 2019-08-30T18:53:44Z | NONE | Would it be possible that pd.{Series, DataFrame}.to_xarray() automatically creates a sparse dataarray - or we have a flag in to_xarray which allows controlling for this. I have a very sparse dataframe and everytime I try to convert it to xarray I blow out my memory. Keeping it sparse but logically as a DataArray would be fantastic. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
524104485 | https://github.com/pydata/xarray/issues/3213#issuecomment-524104485 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyNDEwNDQ4NQ== | darothen 4992424 | 2019-08-22T22:39:21Z | 2019-08-22T22:39:21Z | NONE | Tagging @jeliashi for visibility/collaboration |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521596825 | https://github.com/pydata/xarray/issues/3213#issuecomment-521596825 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTU5NjgyNQ== | ivirshup 8238804 | 2019-08-15T10:34:30Z | 2019-08-15T10:34:30Z | NONE | That's fair. I just think it would be useful to have an assurance that indices are sorted you read them. I don't see how to express this within the CF specs while still looking like a COO array though. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
521530770 | https://github.com/pydata/xarray/issues/3213#issuecomment-521530770 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMTUzMDc3MA== | ivirshup 8238804 | 2019-08-15T06:28:24Z | 2019-08-15T07:33:04Z | NONE | Would it be feasible to use the contiguous ragged array spec or the gathering based compression when the COO coordinates are sorted? I think this could be very helpful for read efficiency, though I'm not sure if random writes were a requirement here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 | |
520741706 | https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 | https://api.github.com/repos/pydata/xarray/issues/3213 | MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg== | khaeru 1634164 | 2019-08-13T08:31:30Z | 2019-08-13T08:31:30Z | NONE | This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost always sparse, so first-class sparse support will be broadly useful. I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
How should xarray use/support sparse arrays? 479942077 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 14