html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1482#issuecomment-1457965323,https://api.github.com/repos/pydata/xarray/issues/1482,1457965323,IC_kwDOAMm_X85W5skL,40465719,2023-03-07T10:58:12Z,2023-03-07T10:58:12Z,NONE,"> As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.
> 
> Long time ago [I asked there a question](https://github.com/pydata/xarray/issues/3213#issuecomment-585668294) about how xarray supports sparse arrays. But what I actually meant were ""Jagged Arrays"". I just was not aware of that term and stumbled over it some days ago the very first time.

I also recently came across [awkward](https://www.youtube.com/watch?v=pvrRFsFqdYs)/jagged/ragged arrays, and that's exactly how I would like to operate on multi-dimensional (2D in [referenced case](https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301)) sparse data:

![image](https://user-images.githubusercontent.com/40465719/223396784-cef504c6-cb37-419b-9872-095a0e10595a.png)

Instead of allocating memory with NaNs, empty slots are just not materialized by using `pd.SparseDtype(""float"", np.nan)` dtype.

You basically create a dense duck array from sparse dtypes, as the [Pandas sparse user guide](https://pandas.pydata.org/docs/user_guide/sparse.html) shows:
![image](https://user-images.githubusercontent.com/40465719/223397427-2192cd6a-5f85-4414-a4a2-7535bbbdd4de.png)

So, all the shape, dtype, and ndim requirements are satisfied, and xarray could implement this as a duck array.

And while you can already wrap sparse duck arrays with `xr.Variable`, I'm not sure if the wrapper maintains the dtype:
![image](https://user-images.githubusercontent.com/40465719/223402794-265c6783-41b9-44b2-8630-76cc5b3b18dd.png)

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/3213#issuecomment-1014462537,https://api.github.com/repos/pydata/xarray/issues/3213,1014462537,IC_kwDOAMm_X848d3hJ,40465719,2022-01-17T12:20:18Z,2022-01-17T12:20:18Z,NONE,"I know. But having sparse data I can treat as if it were dense allows me to unstack without running out of memory, and then ffill & downsample the data in chunks:

![image](https://user-images.githubusercontent.com/40465719/149767712-43053b8d-ec87-4f3a-804d-14704ab785cf.png)

It would be nice if xarray automatically converted the data from sparse back to dense for doing operations on the chunks just like pandas does.

The picture shows that I'm already using nbytes to determine the size.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301,https://api.github.com/repos/pydata/xarray/issues/3213,1013887301,IC_kwDOAMm_X848brFF,40465719,2022-01-16T14:35:29Z,2022-01-16T14:40:13Z,NONE,"I would prefer to retain the dense representation, but with tricks to keep the data of sparse type in memory.

Look at the following example with pandas multiindex & sparse dtype:
![image](https://user-images.githubusercontent.com/40465719/149663769-4b9f625d-0a7d-4df7-9157-83795abbf3f3.png)

The dense data uses ~40 MB of memory, while the dense representation with sparse dtypes uses only ~0.5 kB of memory!

And while you can import dataframes with the sparse=True keyword, the size seems to be displayed inaccurately (both are the same size?), and we cannot examine the data like we can with pandas multiindex + sparse dtype:
![image](https://user-images.githubusercontent.com/40465719/149663790-927e190a-adff-45ea-a7c7-430cb2d3b8a7.png)

Besides, a lot of operations are not available on sparse xarray data variables (i.e. if I wanted to group by price level for ffill & downsampling):
![image](https://user-images.githubusercontent.com/40465719/149664636-4fa9696f-de49-4886-a73c-6f20b622ae5c.png)

So, it would be nice if xarray adopted pandas’ approach of unstacking sparse data.

In the end, you could extract all the non-NaN values and write them to a sparse storage format, such as TileDB sparse arrays.
cc: @stavrospapadopoulos","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077