home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where comments = 13 and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 2
  • issue 1

state 2

  • closed 2
  • open 1

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
305757822 MDU6SXNzdWUzMDU3NTc4MjI= 1995 apply_ufunc support for chunks on input_core_dims crusaderky 6213168 open 0     13 2018-03-15T23:50:22Z 2021-05-17T18:59:18Z   MEMBER      

I am trying to optimize the following function:

c = (a * b).sum('x', skipna=False)

where a and b are xarray.DataArray's, both with dimension x and both with dask backend.

I successfully obtained a 5.5x speedup with the following:

@numba.guvectorize(['void(float64[:], float64[:], float64[:])'], '(n),(n)->()', nopython=True, cache=True)
def mulsum(a, b, res):
    acc = 0
    for i in range(a.size):
        acc += a[i] * b[i]
    res.flat[0] = acc

c = xarray.apply_ufunc(
    mulsum, a, b,
    input_core_dims=[['x'], ['x']],
    dask='parallelized', output_dtypes=[float])

The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).

Proposal

Add a parameter to apply_ufunc, reduce_func=None. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.

e.g. my use case above would simply become:

c = xarray.apply_ufunc(
    mulsum, a, b,
    input_core_dims=[['x'], ['x']],
    dask='parallelized', output_dtypes=[float], reduce_func=operator.sum)

So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do

c1 = mulsum(a1, b1)
c2 = mulsum(a2, b2)
c = operator.sum(c1, c2)

Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1995/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
523438384 MDExOlB1bGxSZXF1ZXN0MzQxNDQyMTI4 3537 Numpy 1.18 support crusaderky 6213168 closed 0     13 2019-11-15T12:17:32Z 2019-11-19T14:06:50Z 2019-11-19T14:06:46Z MEMBER   0 pydata/xarray/pulls/3537

Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18. All other nan-reductions on datetime64s were broken before and remain broken. mean() on datetime64 and dask was broken before and remains broken.

  • [x] Closes #3409
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3537/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
297631403 MDExOlB1bGxSZXF1ZXN0MTY5NTEyMjU1 1915 h5netcdf new API support crusaderky 6213168 closed 0     13 2018-02-15T23:15:55Z 2018-05-11T23:49:00Z 2018-05-08T02:25:40Z MEMBER   0 pydata/xarray/pulls/1915

Closes #1536

Support arbitrary compression plugins through the h5netcdf new API.

Done: - public API and docstrings (untested) - implementation - unit tests - What's New

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1915/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 718.494ms · About: xarray-datasette