home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 305757822 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • shoyer 6
  • crusaderky 3
  • fujiisoup 2
  • stale[bot] 1
  • TomNicholas 1

author_association 2

  • MEMBER 12
  • NONE 1

issue 1

  • apply_ufunc support for chunks on input_core_dims · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
842556731 https://github.com/pydata/xarray/issues/1995#issuecomment-842556731 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDg0MjU1NjczMQ== TomNicholas 35968931 2021-05-17T18:59:18Z 2021-05-17T18:59:18Z MEMBER

Has this not been solved by the argument allow_rechunk?

@crusaderky isn't this effectively what you were trying to achieve?

```python import xarray as xr

def mulsum(a, b): acc = 0 for i in range(a.size): acc += a[i] * b[i] return acc

a = xr.DataArray(data=[1, 2, 3], dims=['x']).chunk({"x": 1})

b = xr.DataArray(data=[4, 5, 6], dims=['x']).chunk({"x": 1})

c = xr.apply_ufunc( mulsum, a, b, input_core_dims=[['x'], ['x']], dask='parallelized', output_dtypes=[float], dask_gufunc_kwargs={'allow_rechunk': True})

print(c.compute()) returns <xarray.DataArray ()> array(32) ```

I think this has only been possible since the implementation of xarray.apply_ufunc was switched from dask.array.blockwise to dask.array.apply_gufunc in #4060.

If this is actually doing what I think it's doing then we should document this possibility!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
603493332 https://github.com/pydata/xarray/issues/1995#issuecomment-603493332 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDYwMzQ5MzMzMg== stale[bot] 26384082 2020-03-24T20:40:45Z 2020-03-24T20:40:45Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
384071053 https://github.com/pydata/xarray/issues/1995#issuecomment-384071053 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM4NDA3MTA1Mw== crusaderky 6213168 2018-04-24T20:35:21Z 2018-04-24T20:36:00Z MEMBER

@shoyer , you don't really need a parameter possibly_chunked_core_dims=['x']; you are already specifying output_chunks - without which apply_ufunc won't know what to do and crash...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373870013 https://github.com/pydata/xarray/issues/1995#issuecomment-373870013 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3Mzg3MDAxMw== crusaderky 6213168 2018-03-16T23:19:19Z 2018-03-29T09:57:14Z MEMBER

[EDIT] drastically simplified chunking algorithm

@shoyer , close, but your version doesn't work in case of broadcasting. I think I fixed it although it won't work correctly if only one between a or b has dask backend, and I'm not sure how to fix it:

```python import xarray import numpy import dask.array

coefficients = xarray.DataArray( dask.array.random.random((106, 99), chunks=(25, 25)), dims=['formula', 'time']) components = xarray.DataArray( dask.array.random.random((106, 512 * 1024), chunks=(25, 65536)), dims=['formula', 'scenario'])

def mulsum(a, b, dim): return xarray.apply_ufunc( _mulsum_xarray_kernel, a, b, input_core_dims=[[dim], [dim]], dask='allowed', output_dtypes=[float])

def _mulsum_xarray_kernel(a, b): if isinstance(a, dask.array.Array) and isinstance(b, dask.array.Array): chunks = dask.array.core.broadcast_chunks(a.chunks, b.chunks) chunks = chunks[:-1] + (tuple(1 for _ in chunks[-1]), )

    mapped = dask.array.map_blocks(
        _mulsum_dask_kernel, a, b,
        dtype=float, chunks=chunks)
    return dask.array.sum(mapped, axis=-1)
else:
    return _mulsum_dask_kernel(a, b)

def _mulsum_dask_kernel(a, b): a = numpy.ascontiguousarray(a) b = numpy.ascontiguousarray(b) res = numpy.einsum('...i,...i', a, b, optimize='optimal') return res[..., numpy.newaxis]

mulsum(coefficients, components, dim='formula') ```

Proposal 2

Modify apply_ufunc: * remove the check that the input_core_dims must not be chunked * add parameter output_chunks

My initial example would become:

```python def mulsum_kernel(a, b): return numpy.einsum('...i,...i', a, b)[..., numpy.newaxis]

c = xarray.apply_ufunc( mulsum_kernel, a, b, dask='parallelized', input_core_dims=[['x'], ['x']], output_dtypes=[float], output_core_dims=[['__partial']], output_chunks={'__partial': [1 for _ in a.chunks[a.dims.index('x')]} ).sum('__partial') ``` Although I'm not sure this approach would be univocous when there's more than one core_dim...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373871784 https://github.com/pydata/xarray/issues/1995#issuecomment-373871784 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3Mzg3MTc4NA== shoyer 1217238 2018-03-16T23:32:07Z 2018-03-16T23:32:07Z MEMBER

Modify apply_ufunc: remove the check that the input_core_dims must not be chunked add parameter output_chunks

My main concern is ensuring that someone does not inadvertently apply a function not designed for multiple chunks to dask arrays. For example, suppose the function being applied is np.median.

Some loud flag that makes it very obvious what's going on seems like a good idea, e.g., possibly_chunked_core_dims=['x']?

Then we also need some sort of guarantee that chunked core dimensions aren't entirely removed, or else xarray/dask won't know how to stack them back up. I guess we could check to make sure that at least as many output core dimensions appear as appear in inputs cor edimensions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373579142 https://github.com/pydata/xarray/issues/1995#issuecomment-373579142 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3OTE0Mg== shoyer 1217238 2018-03-16T01:55:44Z 2018-03-16T01:55:44Z MEMBER

Try: ```python import dask.array import numpy as np

def mulsum_chunk(a, b): return np.einsum('...i,...i', a, b)[..., np.newaxis]

def mulsum(a, b): # needs broadcasting/rechunking for a,b mapped = dask.array.map_blocks(mulsum_chunk, a, b, dtype=float, chunks=a.chunks[:-1] + (tuple(1 for _ in a.chunks[-1]),)) return dask.array.sum(mapped, axis=-1) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373578226 https://github.com/pydata/xarray/issues/1995#issuecomment-373578226 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3ODIyNg== shoyer 1217238 2018-03-16T01:50:07Z 2018-03-16T01:50:07Z MEMBER

could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it.

OK, thinking a little more about it, this would not work with dask='parallelized' which does not allow for chunking over core dimensions. You would have parallelize the function with dask yourself, e.g., with dask.array.map_blocks, but then you could use apply_ufunc with dask='allowed'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373576583 https://github.com/pydata/xarray/issues/1995#issuecomment-373576583 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3NjU4Mw== crusaderky 6213168 2018-03-16T01:40:05Z 2018-03-16T01:40:05Z MEMBER

For this specific problem, I think you could solve it with xarray.apply_ufunc by writing something like a gufunc that keeps the reduced axis as size 1 to apply to each chunk, and afterwards summing up along that dimension.

@shoyer could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373572878 https://github.com/pydata/xarray/issues/1995#issuecomment-373572878 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU3Mjg3OA== shoyer 1217238 2018-03-16T01:16:57Z 2018-03-16T01:16:57Z MEMBER

One way to allow chunking across x would be to finish up dask.array.einsum: https://github.com/dask/dask/issues/732

I'm reluctant to add reduce_func to xarray because it isn't clear to me exactly what the underlying abstraction is. It's something like a gufunc, but does a little bit more. Also, ideally we'd like this to be in dask.array, maybe as part of dask.array.apply_gufunc (https://github.com/dask/dask/pull/3109).

For this specific problem, I think you could solve it with xarray.apply_ufunc by writing something like a gufunc that keeps the reduced axis as size 1 to apply to each chunk, and afterwards summing up along that dimension.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373569674 https://github.com/pydata/xarray/issues/1995#issuecomment-373569674 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2OTY3NA== fujiisoup 6815844 2018-03-16T00:57:21Z 2018-03-16T00:57:21Z MEMBER

If a.dims=('x', 'y', 'z') and b.dims=('x', 'y', 'w'), then we can't use tensordot, as we need to multiply along dimension y. Maybe we can use matmul in some limited case, but generally no.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373569090 https://github.com/pydata/xarray/issues/1995#issuecomment-373569090 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2OTA5MA== shoyer 1217238 2018-03-16T00:53:34Z 2018-03-16T00:53:34Z MEMBER

For two inputs, don't we use dask.array.tensordot?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373568992 https://github.com/pydata/xarray/issues/1995#issuecomment-373568992 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2ODk5Mg== fujiisoup 6815844 2018-03-16T00:52:57Z 2018-03-16T00:52:57Z MEMBER

I think if a and b have common dimensions other than x, even xarray.dot() does not allow chunking along x (because it internally uses apply_ufunc with dask=parallerized).

I think it would be nice if we could have a way to allow chunking along input_core_dims, though I do not yet imagine how it should look like.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822
373568240 https://github.com/pydata/xarray/issues/1995#issuecomment-373568240 https://api.github.com/repos/pydata/xarray/issues/1995 MDEyOklzc3VlQ29tbWVudDM3MzU2ODI0MA== shoyer 1217238 2018-03-16T00:48:12Z 2018-03-16T00:48:12Z MEMBER

Have you tried the new xarray.dot()? That might be even faster for this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc support for chunks on input_core_dims 305757822

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 5198.739ms · About: xarray-datasette