home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where author_association = "MEMBER" and issue = 479942077 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • crusaderky 7
  • shoyer 6
  • hameerabbasi 3
  • dcherian 3
  • rabernat 2
  • mrocklin 1
  • keewis 1

issue 1

  • How should xarray use/support sparse arrays? · 23 ✖

author_association 1

  • MEMBER · 23 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1534724554 https://github.com/pydata/xarray/issues/3213#issuecomment-1534724554 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85begnK rabernat 1197350 2023-05-04T12:51:59Z 2023-05-04T12:51:59Z MEMBER

I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly tested.

Existing sparse testing is here: https://github.com/pydata/xarray/blob/main/xarray/tests/test_sparse.py

We would welcome enhancements to this!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1534238962 https://github.com/pydata/xarray/issues/3213#issuecomment-1534238962 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85bcqDy hameerabbasi 2190658 2023-05-04T07:47:04Z 2023-05-04T07:47:04Z MEMBER

Speaking a bit to things like cumprod, it's hard to support those natively with sparse data structures in many cases (at least as things stand in the current Numba framework).

While that doesn't apply in the case of cumprod, PyData/Sparse also has a policy that if the best algorithm available is a dense one, we simply raise an error, and the user should densify explicitly to avoid filling all available RAM or getting obscure MemoryErrors.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1534001190 https://github.com/pydata/xarray/issues/3213#issuecomment-1534001190 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85bbwAm rabernat 1197350 2023-05-04T02:36:57Z 2023-05-04T02:36:57Z MEMBER

Hi @jdbutler and welcome! We would welcome this sort of contribution eagerly.

I would characterize our current support of sparse arrays as really just a proof of concept. When to use sparse and how to do it effectively is not well documented. Simply adding more documentation around the already-supported use cases would be a great place to start IMO.

My own exploration of this are described in this Pangeo post. The use case is regridding. It touches on quite a few of the points you're interested in, in particular the integration with geodataframe. Along similar lines, @dcherian has been working on using opt_einsum together with sparse in https://github.com/pangeo-data/xESMF/issues/222#issuecomment-1524041837 and https://github.com/pydata/xarray/issues/7764.

I'd also suggest catching up on what @martinfleis is doing with vector data cubes in xvec. (See also Pangeo post on this topic.)

Of the three topics you enumerated, I'm most interested in the serialization one. However, I'd rather see serialization of sparse arrays prototyped in Zarr, as its much more conducive to experimentation than NetCDF (which requires writing C to do anything custom). I would recommend exploring serialization from a sparse array in memory to a sparse format on disk via a custom codec. Zarr recently added support for a meta_array parameter that determines what array type is materialized by the codec pipeline (see https://github.com/zarr-developers/zarr-python/pull/1131). The use case there was loading data direct to GPU. In a way sparse is similar--it's an array container that is not numpy or dask.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1014383681 https://github.com/pydata/xarray/issues/3213#issuecomment-1014383681 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X848dkRB hameerabbasi 2190658 2022-01-17T10:48:48Z 2022-01-17T10:48:48Z MEMBER

For ffill specifically, you would get a dense array out anyway, so there's no point to keeping it sparse, unless one did something like run-length-encoding or similar.

As for the size issue, PyData/Sparse provides the nbytes attribute which could be helpful in determining size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
943517731 https://github.com/pydata/xarray/issues/3213#issuecomment-943517731 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X844PPAj keewis 14808389 2021-10-14T16:25:04Z 2021-10-14T16:25:04Z MEMBER

that's mostly an oversight, I think. However, to be really useful we'd need to get a sparse-xarray library which makes working with sparse and xarray easier (going from dense to sparse or the reverse still requires something like da.copy(data=sparse.COO.from_numpy(da.data)), which is not user-friendly).

Anyways, the docs you're looking for is working with numpy-like arrays, even though there's no explicit mention of sparse there, either.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
615772303 https://github.com/pydata/xarray/issues/3213#issuecomment-615772303 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYxNTc3MjMwMw== hameerabbasi 2190658 2020-04-18T08:41:39Z 2020-04-18T08:41:39Z MEMBER

Hi. Yes, it’d be nice if we had a meta issue I could then open separate issues for for sllearn implementations.

Performance is not ideal, and I realise that. However I’m working on a more generic solution to performance as I type.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
615501070 https://github.com/pydata/xarray/issues/3213#issuecomment-615501070 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYxNTUwMTA3MA== mrocklin 306380 2020-04-17T23:08:18Z 2020-04-17T23:08:18Z MEMBER

@amueller have you all connected with @hameerabbasi ? I'm not surprised to hear that there are performance issues with pydata/sparse relative to scipy.sparse, but Hameer has historically been pretty open to working to resolve issues quickly. I'm not sure if there is already an ongoing conversation between the two groups, but I'd recommend replacing "we've chosen not to use pydata/sparse because it isn't feature complete enough for us" with "in order for us to use pydata/sparse we would need the following features".

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
615499609 https://github.com/pydata/xarray/issues/3213#issuecomment-615499609 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYxNTQ5OTYwOQ== shoyer 1217238 2020-04-17T23:01:15Z 2020-04-17T23:01:15Z MEMBER

Wrapping scipy.sparse in xarray would present two challenges:

  1. It only supports 2D arrays, which feels awkward for a library focused on N-dimensional data.
  2. There is no existing "duck array" compatibility layer (i.e., __array_function__) that makes scipy.sparse matrices work like NumPy arrays (in fact, they actually are designed to mimic the deprecated np.matrix).

(2) is the biggest challenge. I don't want to maintain that compatibility layer inside xarray, but if it existed we would be happy to try using it.

pydata/sparse solves both these problems, though again indeed it only has quite limited data structures.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
592476821 https://github.com/pydata/xarray/issues/3213#issuecomment-592476821 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDU5MjQ3NjgyMQ== crusaderky 6213168 2020-02-28T11:39:50Z 2020-02-28T11:39:50Z MEMBER

xr.apply_ufunc(sparse.COO, ds, dask='parallelized')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
587564478 https://github.com/pydata/xarray/issues/3213#issuecomment-587564478 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDU4NzU2NDQ3OA== crusaderky 6213168 2020-02-18T16:58:25Z 2020-02-18T16:58:25Z MEMBER

you just need to

  1. load up your NetCDF files with xarray.open_mfdataset. This will give you
  2. an xarray.Dataset,
  3. that wraps around one dask.array.Array per variable,
  4. that wrap around one numpy.ndarray (DENSE array) per dask chunk.

  5. convert to sparse with xarray.apply_ufunc(sparse.COO, ds). This will give you

  6. an xarray.Dataset,
  7. that wraps around one dask.array.Array per variable,
  8. that wrap around one sparse.COO (SPARSE array) per dask chunk.

  9. use xarray.merge or whatever to align and merge

  10. you may want to rechunk at this point to obtain less, larger chunks. You can estimate your chunk size in bytes if you know your data density (read my previous email).

  11. Do whatever other calculations you want. All operations will produce in output the same data type as point 2.

  12. To go back to dense, invoke xarray.apply_ufunc(lambda x: x.todense(), ds) to go back to the format as in (1). This step is only necessary if you have something that won't accept/recognize sparse arrays directly in input; namely, writing to a NetCDF dataset. If your data has not been reduced enough, you may need to rechunk into smaller chunks first in order to fit into your RAM constraints.

Regards

On Tue, 18 Feb 2020 at 13:56, fmfreeze notifications@github.com wrote:

Thank you @crusaderky https://github.com/crusaderky for your input.

I understand and agree with your statements for sparse data files. My approach is different, because within my (hdf5) data files on disc, I have no sparse datasets at all.

But as I combine two differently sampled xarray dataset (initialized by h5py > dask > xarray) with xarrays built-in top-level function "xarray.merge()" (resp. xarray.combine_by_coords()), the resulting dataset is sparse.

Generally that is nice behaviour, because two differently sampled datasets get aligned along a coordinate/dimension, and the gaps are filled by NaNs.

Nevertheless, thos NaN "gaps" seem to need memory for every single NaN. That is what should be avoided. Maybe by implementing a redundant pointer to the same memory adress for each NaN?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3213?email_source=notifications&email_token=ABPM4MFWF22BFFYDHV6BS2DRDPSHXA5CNFSM4ILGYGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMCBWHQ#issuecomment-587471646, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPM4MHIUWDYX6ZFKRRBIJLRDPSHXANCNFSM4ILGYGPQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
585997533 https://github.com/pydata/xarray/issues/3213#issuecomment-585997533 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDU4NTk5NzUzMw== crusaderky 6213168 2020-02-13T22:12:37Z 2020-02-13T22:12:37Z MEMBER

Hi fmfreeze,

> Dask integration enables xarray to scale to big data, only as long as the data has no sparse character. Do you agree on that formulation or am I missing something fundamental?

I don't agree. To my understanding xarray->dask->sparse works very well (save bugs), as long as your data density (the percentage of non-default points) is roughly constant across dask chunks. If it isn't, then you'll have some chunks that consume substantially more RAM and CPU to compute than others. This can be mitigated, if you know in advance where you are going to have more samples, by setting uneven dask chunk sizes. For example, if you have a one-dimensional array of 100k points and you know in advance that the density of non-default samples follows a gaussian or triangular distribution, then it may be wise to have very large chunks at the tails and then get them progressively smaller towards the center, e.g. (30k, 12k, 5k, 2k, 1k, 1k, 2k, 5k, 10k, 30k). Of course, there are use cases where you're going to have unpredictable hotspots; I'm afraid that in those the only thing you can do is size your chunks for the worst case and end up oversplitting everywhere else.

Regards Guido

On Thu, 13 Feb 2020 at 10:55, fmfreeze notifications@github.com wrote:

Thank you all for making xarray and its tight development with dask so great!

As @shoyer https://github.com/shoyer mentioned

Yes, it would be useful (eventually) to have lazy loading of sparse arrays from disk, like we want we currently do for dense arrays. This would indeed require knowing that the indices are sorted.

I am wondering, if creating a lazy & sparse xarray Dataset/DataArray is already possible? Especially when creating the sparse part at runtime, and loading only the data part: Assume two differently sampled - and lazy dask - DataArrays are merged/combined along a coordinate axis into a Dataset. Then the smaller (= less dense) DataVariable is filled with NaNs. As far as I experienced the current behaviour is, that each NaN value requires memory.

That issue might be formulated this way: Dask integration enables xarray to scale to big data, only as long as the data has no sparse character. Do you agree on that formulation or am I missing something fundamental?

A code example reproducing that issue is described here: https://stackoverflow.com/q/60117268/9657367

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3213?email_source=notifications&email_token=ABPM4MBFBIH7EK4PPAWHRH3RCURJLA5CNFSM4ILGYGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELUJNRQ#issuecomment-585668294, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPM4MCMRVUZXQSDYCAIP3LRCURJLANCNFSM4ILGYGPQ .

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
551134122 https://github.com/pydata/xarray/issues/3213#issuecomment-551134122 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDU1MTEzNDEyMg== dcherian 2448579 2019-11-07T15:40:12Z 2019-11-07T15:40:12Z MEMBER

the coords, data_vars, join ,compat kwargs in that example are passed down to concat and merge, as appropriate. We do need more documentation on that ....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
551125982 https://github.com/pydata/xarray/issues/3213#issuecomment-551125982 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDU1MTEyNTk4Mg== dcherian 2448579 2019-11-07T15:23:56Z 2019-11-07T15:23:56Z MEMBER

@El-minadero a lot of that overhead may be fixed on master and more recent xarray versions. https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets has some tips on quickly concatenating / merging datasets. It depends on the datasets you are joining...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
527766483 https://github.com/pydata/xarray/issues/3213#issuecomment-527766483 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyNzc2NjQ4Mw== crusaderky 6213168 2019-09-04T06:46:08Z 2019-09-04T06:46:08Z MEMBER

@p-d-moore what you say makes sense but it is well outside of the domain of xarray. What you're describing is basically a new sparse class, substantially more sophisticated than COO, and should be proposed in the sparse board, not here. After it's implemented in sparse, xarray will be able to wrap around it.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
526748987 https://github.com/pydata/xarray/issues/3213#issuecomment-526748987 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyNjc0ODk4Nw== shoyer 1217238 2019-08-30T21:01:55Z 2019-08-30T21:01:55Z MEMBER

You will need to install NumPy 1.17 or set the env variable before importing NumPy.

On Fri, Aug 30, 2019 at 1:57 PM firdaus janoos notifications@github.com wrote:

Thanks.

That solved that error but introduced another one.

Specifically - this is my dataframe [image: image] https://user-images.githubusercontent.com/923438/64050831-2d061280-cb47-11e9-915b-01fe42eadefe.png

and this is the error that I get with sparse=True

[image: image] https://user-images.githubusercontent.com/923438/64049668-91bf6e00-cb43-11e9-921f-1a044f3446a9.png [image: image] https://user-images.githubusercontent.com/923438/64050631-a94c2600-cb46-11e9-8653-9820b445bc86.png

My numpy version is definitely about 1.16 [image: image] https://user-images.githubusercontent.com/923438/64050648-b701ab80-cb46-11e9-8dac-aaf2bf9e260d.png

I also set this os.environ["NUMPY_EXPERIMENTAL_ARRAY_FUNCTION"]='1' just in case

Furthermore, I don't get this error when I don't set sparse=True ( I just get OOM errors but that's another matter) ...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3213?email_source=notifications&email_token=AAJJFVTN37AMEA6ROS7YT2LQHGCVHA5CNFSM4ILGYGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SYQ6Q#issuecomment-526747770, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVWM2V4H3V6BJMJ7IQDQHGCVHANCNFSM4ILGYGPQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
526736529 https://github.com/pydata/xarray/issues/3213#issuecomment-526736529 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyNjczNjUyOQ== dcherian 2448579 2019-08-30T20:21:28Z 2019-08-30T20:21:28Z MEMBER

conda install -c conda-forge sparse

Basically you need to install https://sparse.pydata.org/en/latest/ using either pip or conda.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
526718101 https://github.com/pydata/xarray/issues/3213#issuecomment-526718101 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyNjcxODEwMQ== shoyer 1217238 2019-08-30T19:19:13Z 2019-08-30T19:19:13Z MEMBER

We have a new "sparse=True" option in xarray.Dataset.from_dataframe for exactly this use case. Pandas's to_xarray() method just calls this method, so it would make sense to forward keyword arguments, too.

On Fri, Aug 30, 2019 at 11:53 AM firdaus janoos notifications@github.com wrote:

Would it be possible that pd.{Series, DataFrame}.to_xarray() automatically creates a sparse dataarray - or we have a flag in to_xarray which allows controlling for this. I have a very sparse dataframe and everytime I try to convert it to xarray I blow out my memory. Keeping it sparse but logically as a DataArray would be fantastic

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3213?email_source=notifications&email_token=AAJJFVTD2IWWPE6RTSWPVLDQHFUDTA5CNFSM4ILGYGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SPPNI#issuecomment-526710709, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVWEOGEBGXV62QFYU6DQHFUDTANCNFSM4ILGYGPQ .

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521691465 https://github.com/pydata/xarray/issues/3213#issuecomment-521691465 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTY5MTQ2NQ== shoyer 1217238 2019-08-15T15:50:42Z 2019-08-15T15:50:42Z MEMBER

Yes, it would be useful (eventually) to have lazy loading of sparse arrays from disk, like we want we currently do for dense arrays. This would indeed require knowing that the indices are sorted.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521533999 https://github.com/pydata/xarray/issues/3213#issuecomment-521533999 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTUzMzk5OQ== shoyer 1217238 2019-08-15T06:42:44Z 2019-08-15T06:42:44Z MEMBER

I like the indexed ragged array representation because it maps directly into sparse’s COO format. I’m sure other formats would be possible, but they would also likely be harder to implement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521301555 https://github.com/pydata/xarray/issues/3213#issuecomment-521301555 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTMwMTU1NQ== shoyer 1217238 2019-08-14T15:42:58Z 2019-08-14T15:42:58Z MEMBER

netCDF has a pretty low-level base spec, with conventions left to higher level docs like CF conventions. Fortunately, there does seems to be a CF convention that would be a good fit for for sparse data in COO format, namely the indexed ragged array representation (example, note the instance_dimension attribute). That's probably the right thing to use for sparse arrays in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521224538 https://github.com/pydata/xarray/issues/3213#issuecomment-521224538 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTIyNDUzOA== crusaderky 6213168 2019-08-14T12:25:39Z 2019-08-14T12:25:39Z MEMBER

As for NetCDF, instead of a bespoke xarray-only convention, wouldn't it be much better to push a spec extension upstream?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521223609 https://github.com/pydata/xarray/issues/3213#issuecomment-521223609 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTIyMzYwOQ== crusaderky 6213168 2019-08-14T12:22:37Z 2019-08-14T12:22:37Z MEMBER

As already mentioned in #3206, unstack(sparse=True) would be extremely useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
521221473 https://github.com/pydata/xarray/issues/3213#issuecomment-521221473 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMTIyMTQ3Mw== crusaderky 6213168 2019-08-14T12:15:39Z 2019-08-14T12:20:59Z MEMBER

+1 for the introduction of to_sparse() / to_dense(), but let's please avoid the mistakes that were done with chunk(). DataArray.chunk() is extremely frustrating when you have non-index coords and, 9 times out of 10, you only want to chunk the data and you have to go through the horrid python a = DataArray(a.data.chunk(), dims=a.dims, coords=a.coords, attrs=a.attrs, name=a.name) Exactly the same issue would apply to to_sparse().

Possibly we could define them as ```python class DataArray: def to_sparse( self, data: bool = True, coords: Union[Iterable[Hashable], bool] = False )

class Dataset: def to_sparse( self, data_vars: Union[Iterable[Hashable], bool] = True, coords: Union[Iterable[Hashable], bool] = False ) ``` same for to_dense() and chunk() (the latter would require a DeprecationWarning for a few release before switching the default for coords from True to False - only to be triggered in presence of dask-backed coords).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.775ms · About: xarray-datasette