html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6807#issuecomment-1287134964,https://api.github.com/repos/pydata/xarray/issues/6807,1287134964,IC_kwDOAMm_X85MuB70,2448579,2022-10-21T15:38:27Z,2022-10-21T18:08:49Z,MEMBER,"IIUC the issue Ryan & Tom are talking about is tied to reading from files.
For example, we read from a zarr store using `zarr`, then wrap that `zarr.Array` (or h5Py Dataset) with a large number of `ExplicitlyIndexed` Classes that enable more complicated indexing, lazy decoding etc.
IIUC #4628 is about concatenating such arrays i.e. neither `zarr.Array` nor `ExplicitlyIndexed` support concatenation, so we end up calling `np.array` and forcing a disk read.
With dask or cubed we would have `dask(ExplicitlyIndexed(zarr))` or `cubed(ExplicitlyIndexed(zarr))` so as long as `dask` and `cubed` define `concat` and we dispatch to them, everything is 👍🏾
PS: This is what I was attempting to explain (not very clearly) in the distributed arrays meeting. We don't ever use `dask.array.from_zarr` (for e.g.). We use `zarr` to read, then wrap in `ExplicitlyIndexed` and then pass to `dask.array.from_array`.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1286421985,https://api.github.com/repos/pydata/xarray/issues/6807,1286421985,IC_kwDOAMm_X85MrT3h,1217238,2022-10-21T03:49:18Z,2022-10-21T03:49:18Z,MEMBER,"Cubed should define a concatenate function, so that should be OK","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1286028393,https://api.github.com/repos/pydata/xarray/issues/6807,1286028393,IC_kwDOAMm_X85Mpzxp,35968931,2022-10-20T19:22:11Z,2022-10-20T19:22:11Z,MEMBER,"@rabernat just pointed out to me that in order for this to work well we might also need [lazy concatenation of arrays.](https://github.com/pydata/xarray/issues/4628)
Xarray currently has it's own internal wrappers that allow lazy indexing, but they don't yet allow lazy concatenation. Instead dask is what does lazy concatenation under the hood right now.
This is a problem - it means that concatenating two cubed-backed DataArrays will trigger loading both into memory, whereas concatenating two dask-backed DataArrays will not. If #4628 was implemented then xarray would never load the underlying array into memory regardless of the backend.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1277301954,https://api.github.com/repos/pydata/xarray/issues/6807,1277301954,IC_kwDOAMm_X85MIhTC,4160723,2022-10-13T09:22:27Z,2022-10-13T09:22:27Z,MEMBER,"Not really a generic and parallel execution back-end, but [Open-EO](https://openeo.org/) looks like an interesting use case too (it is a framework for managing remote execution of processing tasks on multiple big Earth observation cloud back-ends via a common API). I've suggested the idea of reusing the Xarray API here: https://github.com/Open-EO/openeo-python-client/issues/334.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1188550877,https://api.github.com/repos/pydata/xarray/issues/6807,1188550877,IC_kwDOAMm_X85G19jd,13301940,2022-07-19T03:22:07Z,2022-07-19T03:22:07Z,MEMBER,at SciPy i learned of [fugue](https://github.com/fugue-project/fugue) which tries to provide a unified API for distributed DataFrames on top of Spark and Dask. it could be a great source of inspiration. ,"{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1188520871,https://api.github.com/repos/pydata/xarray/issues/6807,1188520871,IC_kwDOAMm_X85G12On,1217238,2022-07-19T02:18:03Z,2022-07-19T02:18:03Z,MEMBER,"Sounds good to me. The challenge will be defining a parallel computing API that works across all these projects, with their slightly different models.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1188496314,https://api.github.com/repos/pydata/xarray/issues/6807,1188496314,IC_kwDOAMm_X85G1wO6,2448579,2022-07-19T01:29:28Z,2022-07-19T01:29:28Z,MEMBER,"Another parallel framework would be [Ramba](https://github.com/Python-for-HPC/ramba)
cc @DrTodd13","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,1308715638
https://github.com/pydata/xarray/issues/6807#issuecomment-1188361671,https://api.github.com/repos/pydata/xarray/issues/6807,1188361671,IC_kwDOAMm_X85G1PXH,2448579,2022-07-18T21:56:58Z,2022-07-18T21:56:58Z,MEMBER,This sounds great! We should finish up https://github.com/pydata/xarray/pull/4972 to make it easier to test.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1308715638