home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 1308715638 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • dcherian 3
  • tomwhite 2
  • shoyer 2
  • DrTodd13 1
  • benbovy 1
  • andersy005 1
  • sdbachman 1
  • TomNicholas 1

author_association 3

  • MEMBER 8
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Alternative parallel execution frameworks in xarray · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1287134964 https://github.com/pydata/xarray/issues/6807#issuecomment-1287134964 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85MuB70 dcherian 2448579 2022-10-21T15:38:27Z 2022-10-21T18:08:49Z MEMBER

IIUC the issue Ryan & Tom are talking about is tied to reading from files.

For example, we read from a zarr store using zarr, then wrap that zarr.Array (or h5Py Dataset) with a large number of ExplicitlyIndexed Classes that enable more complicated indexing, lazy decoding etc.

IIUC #4628 is about concatenating such arrays i.e. neither zarr.Array nor ExplicitlyIndexed support concatenation, so we end up calling np.array and forcing a disk read.

With dask or cubed we would have dask(ExplicitlyIndexed(zarr)) or cubed(ExplicitlyIndexed(zarr)) so as long as dask and cubed define concat and we dispatch to them, everything is 👍🏾

PS: This is what I was attempting to explain (not very clearly) in the distributed arrays meeting. We don't ever use dask.array.from_zarr (for e.g.). We use zarr to read, then wrap in ExplicitlyIndexed and then pass to dask.array.from_array.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1286703986 https://github.com/pydata/xarray/issues/6807#issuecomment-1286703986 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85MsYty tomwhite 85085 2022-10-21T09:31:29Z 2022-10-21T09:31:29Z CONTRIBUTOR

Cubed implements concat, but perhaps xarray needs richer concat functionality than that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1286421985 https://github.com/pydata/xarray/issues/6807#issuecomment-1286421985 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85MrT3h shoyer 1217238 2022-10-21T03:49:18Z 2022-10-21T03:49:18Z MEMBER

Cubed should define a concatenate function, so that should be OK

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1286028393 https://github.com/pydata/xarray/issues/6807#issuecomment-1286028393 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85Mpzxp TomNicholas 35968931 2022-10-20T19:22:11Z 2022-10-20T19:22:11Z MEMBER

@rabernat just pointed out to me that in order for this to work well we might also need lazy concatenation of arrays.

Xarray currently has it's own internal wrappers that allow lazy indexing, but they don't yet allow lazy concatenation. Instead dask is what does lazy concatenation under the hood right now.

This is a problem - it means that concatenating two cubed-backed DataArrays will trigger loading both into memory, whereas concatenating two dask-backed DataArrays will not. If #4628 was implemented then xarray would never load the underlying array into memory regardless of the backend.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1277301954 https://github.com/pydata/xarray/issues/6807#issuecomment-1277301954 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85MIhTC benbovy 4160723 2022-10-13T09:22:27Z 2022-10-13T09:22:27Z MEMBER

Not really a generic and parallel execution back-end, but Open-EO looks like an interesting use case too (it is a framework for managing remote execution of processing tasks on multiple big Earth observation cloud back-ends via a common API). I've suggested the idea of reusing the Xarray API here: https://github.com/Open-EO/openeo-python-client/issues/334.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1247294419 https://github.com/pydata/xarray/issues/6807#issuecomment-1247294419 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85KWDPT DrTodd13 2584128 2022-09-14T20:57:35Z 2022-09-14T20:57:35Z NONE

Might I propose Arkouda?

https://github.com/Bears-R-Us/arkouda https://chapel-lang.org/presentations/Arkouda_SIAM_PP-22.pdf

Have they improved recently to support more than 1D arrays?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1247285190 https://github.com/pydata/xarray/issues/6807#issuecomment-1247285190 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85KWA_G sdbachman 20640546 2022-09-14T20:46:52Z 2022-09-14T20:46:52Z NONE

Might I propose Arkouda?

https://github.com/Bears-R-Us/arkouda https://chapel-lang.org/presentations/Arkouda_SIAM_PP-22.pdf

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Alternative parallel execution frameworks in xarray 1308715638
1188910765 https://github.com/pydata/xarray/issues/6807#issuecomment-1188910765 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85G3Vat tomwhite 85085 2022-07-19T10:58:18Z 2022-07-19T10:58:18Z CONTRIBUTOR

Thanks for opening this @TomNicholas

The challenge will be defining a parallel computing API that works across all these projects, with their slightly different models.

Agreed. I feel like there's already an implicit set of "chunked array" methods that xarray expects from Dask that could be formalised a bit and exposed as an integration point.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1188550877 https://github.com/pydata/xarray/issues/6807#issuecomment-1188550877 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85G19jd andersy005 13301940 2022-07-19T03:22:07Z 2022-07-19T03:22:07Z MEMBER

at SciPy i learned of fugue which tries to provide a unified API for distributed DataFrames on top of Spark and Dask. it could be a great source of inspiration.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Alternative parallel execution frameworks in xarray 1308715638
1188520871 https://github.com/pydata/xarray/issues/6807#issuecomment-1188520871 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85G12On shoyer 1217238 2022-07-19T02:18:03Z 2022-07-19T02:18:03Z MEMBER

Sounds good to me. The challenge will be defining a parallel computing API that works across all these projects, with their slightly different models.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1188496314 https://github.com/pydata/xarray/issues/6807#issuecomment-1188496314 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85G1wO6 dcherian 2448579 2022-07-19T01:29:28Z 2022-07-19T01:29:28Z MEMBER

Another parallel framework would be Ramba

cc @DrTodd13

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638
1188361671 https://github.com/pydata/xarray/issues/6807#issuecomment-1188361671 https://api.github.com/repos/pydata/xarray/issues/6807 IC_kwDOAMm_X85G1PXH dcherian 2448579 2022-07-18T21:56:58Z 2022-07-18T21:56:58Z MEMBER

This sounds great! We should finish up https://github.com/pydata/xarray/pull/4972 to make it easier to test.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Alternative parallel execution frameworks in xarray 1308715638

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 27.425ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows