home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 58310637

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
58310637 MDU6SXNzdWU1ODMxMDYzNw== 328 Support out-of-core computation using dask 1217238 closed 0   987654 7 2015-02-20T05:02:22Z 2015-04-17T21:03:12Z 2015-04-17T21:03:12Z MEMBER      

Dask is a library for out of core computation somewhat similar to biggus in conception, but with slightly grander aspirations. For examples of how Dask could be applied to weather data, see this blog post by @mrocklin: http://matthewrocklin.com/blog/work/2015/02/13/Towards-OOC-Slicing-and-Stacking/

It would be interesting to explore using dask internally in xray, so that we can implement lazy/out-of-core aggregations, concat and groupby to complement the existing lazy indexing. This functionality would be quite useful for xray, and even more so than merely supporting datasets-on-disk (#199).

A related issue is #79: we can easily imagine using Dask with groupby/apply to power out-of-core and multi-threaded computation.

Todos for xray: - [x] refactor Variable.concat to make use of functions like concatenate and stack instead of in-place array modification (Dask arrays do not support mutation, for good reasons) - [x] refactor reindex_variables to not make direct use of mutation (e.g., by using da.insert below) - [x] add some sort of internal abstraction to represent "computable" arrays that are not necessarily numpy.ndarray objects (done: this is the data attribute) - [x] expose reblock in the public API - [x] load datasets into dask arrays from disk - [x] load dataset from multiple files into dask - [x] ~~some sort of API for user controlled lazy apply on dask arrays (using groupby, mostly likely)~~ (not necessary for initial release) - [x] save from dask arrays - [x] an API for lazy ufuncs like sin and sqrt - [x] robustly handle indexing along orthogonal dimensions if dask can't handle it directly.

Todos for dask (to be clear, none of these are blockers for a proof of concept): - [x] support for NaN skipping aggregations - [x] ~~support for interleaved concatenation (necessary for transformations by group, which are quite common)~~ (turns out to be a one-liner with concatenate and take, see below) - [x] ~~support for something like take_nd from pandas: like np.take, but with -1 as a sentinel value for "missing" (necessary for many alignment operations)~~ da.insert, modeled after np.insert would solve this problem. - [x] ~~support "orthogonal" MATLAB-like array-based indexing along multiple dimensions~~ (taking along one axis at a time is close enough) - [x] broadcast_to: see https://github.com/numpy/numpy/pull/5371

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/328/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.611ms · About: xarray-datasette