html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1482#issuecomment-1457965323,https://api.github.com/repos/pydata/xarray/issues/1482,1457965323,IC_kwDOAMm_X85W5skL,40465719,2023-03-07T10:58:12Z,2023-03-07T10:58:12Z,NONE,"> As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.
>
> Long time ago [I asked there a question](https://github.com/pydata/xarray/issues/3213#issuecomment-585668294) about how xarray supports sparse arrays. But what I actually meant were ""Jagged Arrays"". I just was not aware of that term and stumbled over it some days ago the very first time.
I also recently came across [awkward](https://www.youtube.com/watch?v=pvrRFsFqdYs)/jagged/ragged arrays, and that's exactly how I would like to operate on multi-dimensional (2D in [referenced case](https://github.com/pydata/xarray/issues/3213#issuecomment-1013887301)) sparse data:

Instead of allocating memory with NaNs, empty slots are just not materialized by using `pd.SparseDtype(""float"", np.nan)` dtype.
You basically create a dense duck array from sparse dtypes, as the [Pandas sparse user guide](https://pandas.pydata.org/docs/user_guide/sparse.html) shows:

So, all the shape, dtype, and ndim requirements are satisfied, and xarray could implement this as a duck array.
And while you can already wrap sparse duck arrays with `xr.Variable`, I'm not sure if the wrapper maintains the dtype:

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-1014487633,https://api.github.com/repos/pydata/xarray/issues/1482,1014487633,IC_kwDOAMm_X848d9pR,18172466,2022-01-17T12:51:45Z,2022-01-17T12:51:45Z,NONE,"As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.
Long time ago [I asked there a question](https://github.com/pydata/xarray/issues/3213#issuecomment-585668294) about how xarray supports sparse arrays.
But what I actually meant were ""Jagged Arrays"". I just was not aware of that term and stumbled over it some days ago the very first time.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-320891992,https://api.github.com/repos/pydata/xarray/issues/1482,320891992,MDEyOklzc3VlQ29tbWVudDMyMDg5MTk5Mg==,585279,2017-08-08T08:44:36Z,2017-08-08T08:44:36Z,NONE,"> then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?
I was mostly thinking of using xarray as a basic data format for reusable code. So if I build ML pipelines using reusable components, I have to pass data around. And so initially data might be in jagged arrays and then with various preprocessing before training model, I can get it to be in a more suitable format where images are of the same size so that I can try easier. I hoped I could use the same format for all of these places where I need to pass data around.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-316376598,https://api.github.com/repos/pydata/xarray/issues/1482,316376598,MDEyOklzc3VlQ29tbWVudDMxNjM3NjU5OA==,4992424,2017-07-19T12:54:30Z,2017-07-19T12:54:30Z,NONE,"@mitar it depends on your data/application, right? But that information would also be helpful in figuring out alternative pathways. If you're always going to process the images individually or sequentially, then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-316372189,https://api.github.com/repos/pydata/xarray/issues/1482,316372189,MDEyOklzc3VlQ29tbWVudDMxNjM3MjE4OQ==,585279,2017-07-19T12:37:43Z,2017-07-19T12:37:43Z,NONE,"Hm, padding might use a lot of extra space, no?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-316371416,https://api.github.com/repos/pydata/xarray/issues/1482,316371416,MDEyOklzc3VlQ29tbWVudDMxNjM3MTQxNg==,4992424,2017-07-19T12:34:32Z,2017-07-19T12:34:32Z,NONE,"The problem is that these sorts of arrays break the [common data model](http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/) on top of which xarray (and NetCDF) is built.
> If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.
Yes, if you can pre-process all the images and align them on some common set of dimensions (maybe just **xi** and **yi**, denoting integer index in the x and y directions), and pad unused space for each image with NaNs, then you could concatenate everything into a `Dataset`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948
https://github.com/pydata/xarray/issues/1482#issuecomment-316323858,https://api.github.com/repos/pydata/xarray/issues/1482,316323858,MDEyOklzc3VlQ29tbWVudDMxNjMyMzg1OA==,585279,2017-07-19T09:15:00Z,2017-07-19T09:15:00Z,NONE,"> If you want to store them all in a Dataset, you'll have to give a different dimension name for each new dimension, which can be clumsy.
But I cannot combine multiple dimensions into same Variable, no? So if I have a dataset of multiple variables, each variable seems that it has to have uniform dimensions for all its values? Maybe I am misunderstanding dimensions concept.
> What kind of ""support"" exactly were you thinking of?
Maybe examples how to create such jagged dataset? For example, how to have a variable which stores 2D images of different sizes.
If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,243964948