home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 597825416

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3213#issuecomment-597825416 https://api.github.com/repos/pydata/xarray/issues/3213 597825416 MDEyOklzc3VlQ29tbWVudDU5NzgyNTQxNg== 18172466 2020-03-11T19:29:31Z 2020-03-11T19:29:31Z NONE

Concatenating multiple lazy, differently sized xr.DataArrays - each wrapping a sparse.COO by xr.apply_ufunc(sparse.COO, ds, dask='parallelized') as @crusaderky suggested - results again in an xr.DataArray, whose wrapped dask array chunks are mapped to numpy arrays:

<xarray.DataArray 'myDataset' (cycle: 10, time: 8000000)> dask.array<concatenate, shape=(10, 8000000), dtype=float64, chunksize=(1, 5273216), chunktype=numpy.ndarray> Coordinates: * time (time) float64 0.0 5e-07 1e-06 1.5e-06 2e-06 ... 4.0 4.0 4.0 4.0 * cycle (cycle) int64 1 2 3 4 5 6 7 8 9 10

But also when mapping the resulting, concatenated DataArray to sparse.COO afterwards, my main goal - scalable serialization of a lazy xarray - cannot be achieved.

So one suggestion to @shoyer original question: It would be great, if sparse, but still lazy DataArrays/Datasets could be serialized without the data-overhead itself. Currently, that seems to work only for DataArrays which are merged/aligned by DataArrays of the same shape.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  479942077
Powered by Datasette · Queries took 0.882ms · About: xarray-datasette