home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 305757822 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 6 ✖

issue 1

  • apply_ufunc support for chunks on input_core_dims · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
373871784 https://github.com/pydata/xarray/issues/1995#issuecomment-373871784 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3Mzg3MTc4NA== shoyer 1217238 2018-03-16T23:32:07Z 2018-03-16T23:32:07Z MEMBER

Modify apply_ufunc: remove the check that the input_core_dims must not be chunked add parameter output_chunks

My main concern is ensuring that someone does not inadvertently apply a function not designed for multiple chunks to dask arrays. For example, suppose the function being applied is np.median.

Some loud flag that makes it very obvious what's going on seems like a good idea, e.g., possibly_chunked_core_dims=['x']?

Then we also need some sort of guarantee that chunked core dimensions aren't entirely removed, or else xarray/dask won't know how to stack them back up. I guess we could check to make sure that at least as many output core dimensions appear as appear in inputs cor edimensions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373579142 https://github.com/pydata/xarray/issues/1995#issuecomment-373579142 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3OTE0Mg== shoyer 1217238 2018-03-16T01:55:44Z 2018-03-16T01:55:44Z MEMBER

Try: ```python import dask.array import numpy as np

def mulsum_chunk(a, b): return np.einsum('...i,...i', a, b)[..., np.newaxis]

def mulsum(a, b): # needs broadcasting/rechunking for a,b mapped = dask.array.map_blocks(mulsum_chunk, a, b, dtype=float, chunks=a.chunks[:-1] + (tuple(1 for _ in a.chunks[-1]),)) return dask.array.sum(mapped, axis=-1) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373578226 https://github.com/pydata/xarray/issues/1995#issuecomment-373578226 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3ODIyNg== shoyer 1217238 2018-03-16T01:50:07Z 2018-03-16T01:50:07Z MEMBER

could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it.

OK, thinking a little more about it, this would not work with dask='parallelized' which does not allow for chunking over core dimensions. You would have parallelize the function with dask yourself, e.g., with dask.array.map_blocks, but then you could use apply_ufunc with dask='allowed'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373572878 https://github.com/pydata/xarray/issues/1995#issuecomment-373572878 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3Mjg3OA== shoyer 1217238 2018-03-16T01:16:57Z 2018-03-16T01:16:57Z MEMBER

One way to allow chunking across x would be to finish up dask.array.einsum: https://github.com/dask/dask/issues/732

I'm reluctant to add reduce_func to xarray because it isn't clear to me exactly what the underlying abstraction is. It's something like a gufunc, but does a little bit more. Also, ideally we'd like this to be in dask.array, maybe as part of dask.array.apply_gufunc (https://github.com/dask/dask/pull/3109).

For this specific problem, I think you could solve it with xarray.apply_ufunc by writing something like a gufunc that keeps the reduced axis as size 1 to apply to each chunk, and afterwards summing up along that dimension.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373569090 https://github.com/pydata/xarray/issues/1995#issuecomment-373569090 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2OTA5MA== shoyer 1217238 2018-03-16T00:53:34Z 2018-03-16T00:53:34Z MEMBER

For two inputs, don't we use dask.array.tensordot?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373568240 https://github.com/pydata/xarray/issues/1995#issuecomment-373568240 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2ODI0MA== shoyer 1217238 2018-03-16T00:48:12Z 2018-03-16T00:48:12Z MEMBER

Have you tried the new xarray.dot()? That might be even faster for this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 206.837ms · About: xarray-datasette