home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and issue = 198742089 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • laliberte 3
  • jbusecke 2

issue 1

  • Implementing dask.array.coarsen in xarrays · 5 ✖

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
433510805 https://github.com/pydata/xarray/issues/1192#issuecomment-433510805 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDQzMzUxMDgwNQ== jbusecke 14314623 2018-10-26T18:59:07Z 2018-10-26T18:59:07Z CONTRIBUTOR

I should add that I would be happy to work on an implementation, but probably need a good amount of pointers.

Here is the implementation that I have been using (only works with dask.arrays at this point).

Should have posted that earlier to avoid @rabernat s zingers over here. ```python def aggregate(da, blocks, func=np.nanmean, debug=False): """ Performs efficient block averaging in one or multiple dimensions. Only works on regular grid dimensions. Parameters ---------- da : xarray DataArray (must be a dask array!) blocks : list List of tuples containing the dimension and interval to aggregate over func : function Aggregation function.Defaults to numpy.nanmean Returns ------- da_agg : xarray Data Aggregated array Examples -------- >>> from xarrayutils import aggregate >>> import numpy as np >>> import xarray as xr >>> import matplotlib.pyplot as plt >>> %matplotlib inline >>> import dask.array as da >>> x = np.arange(-10,10) >>> y = np.arange(-10,10) >>> xx,yy = np.meshgrid(x,y) >>> z = xx2-yy2 >>> a = xr.DataArray(da.from_array(z, chunks=(20, 20)), coords={'x':x,'y':y}, dims=['y','x']) >>> print a <xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 20, x: 20)> dask.array<array-7..., shape=(20, 20), dtype=int64, chunksize=(20, 20)> Coordinates: * y (y) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 * x (x) int64 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 >>> blocks = [('x',2),('y',5)] >>> a_coarse = aggregate(a,blocks,func=np.mean) >>> print a_coarse <xarray.DataArray 'array-7e422c91624f207a5f7ebac426c01769' (y: 2, x: 10)> dask.array<coarsen..., shape=(2, 10), dtype=float64, chunksize=(2, 10)> Coordinates: * y (y) int64 -10 0 * x (x) int64 -10 -8 -6 -4 -2 0 2 4 6 8 Attributes: Coarsened with: <function mean at 0x111754230> Coarsenblocks: [('x', 2), ('y', 10)] """ # Check if the input is a dask array (I might want to convert this # automaticlaly in the future) if not isinstance(da.data, Array): raise RuntimeError('data array data must be a dask array') # Check data type of blocks # TODO write test if (not all(isinstance(n[0], str) for n in blocks) or not all(isinstance(n[1], int) for n in blocks)):

    print('blocks input', str(blocks))
    raise RuntimeError("block dimension must be dtype(str), \
    e.g. ('lon',4)")

# Check if the given array has the dimension specified in blocks
try:
    block_dict = dict((da.get_axis_num(x), y) for x, y in blocks)
except ValueError:
    raise RuntimeError("'blocks' contains non matching dimension")

# Check the size of the excess in each aggregated axis
blocks = [(a[0], a[1], da.shape[da.get_axis_num(a[0])] % a[1])
          for a in blocks]

# for now default to trimming the excess
da_coarse = coarsen(func, da.data, block_dict, trim_excess=True)

# for now default to only the dims
new_coords = dict([])
# for cc in da.coords.keys():
warnings.warn("WARNING: only dimensions are carried over as coordinates")
for cc in list(da.dims):
    new_coords[cc] = da.coords[cc]
    for dd in blocks:
        if dd[0] in list(da.coords[cc].dims):
            new_coords[cc] = \
                new_coords[cc].isel(
                    **{dd[0]: slice(0, -(1 + dd[2]), dd[1])})

attrs = {'Coarsened with': str(func), 'Coarsenblocks': str(blocks)}
da_coarse = xr.DataArray(da_coarse, dims=da.dims, coords=new_coords,
                         name=da.name, attrs=attrs)
return da_coarse

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089
433160023 https://github.com/pydata/xarray/issues/1192#issuecomment-433160023 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDQzMzE2MDAyMw== jbusecke 14314623 2018-10-25T18:35:57Z 2018-10-25T18:35:57Z CONTRIBUTOR

Is this feature still being considered? A big +1 from me.

I wrote my own function to achieve this (using dask.array.coarsen), but I was planning to implement a similar functionality in xgcm, and it would be ideal if we could use an upstream implementation from xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089
305176003 https://github.com/pydata/xarray/issues/1192#issuecomment-305176003 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDMwNTE3NjAwMw== laliberte 3217406 2017-05-31T12:45:18Z 2017-05-31T12:45:18Z CONTRIBUTOR

The reason I ask is that, ideally, coarsen would work exactly the same with dask.array and np.ndarray data. By using both serial and parallel coarsen methods from dask, we are adding a dependency but we are ensuring forward compatibility. @shoyer, what's your preference? (1) replicate serial coarsen into xarray or (2) point to dask coarsen methods?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089
305169201 https://github.com/pydata/xarray/issues/1192#issuecomment-305169201 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDMwNTE2OTIwMQ== laliberte 3217406 2017-05-31T12:00:11Z 2017-05-31T12:00:11Z CONTRIBUTOR

If it's part of dask then it would be almost trivial to implement in xarray. @mrocklin Can we assume that dask/array/chunk.py::coarsen is part of the public API?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089
270439515 https://github.com/pydata/xarray/issues/1192#issuecomment-270439515 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDI3MDQzOTUxNQ== laliberte 3217406 2017-01-04T17:59:08Z 2017-01-04T17:59:08Z CONTRIBUTOR

The dask implementation has the following API: dask.array.coarsen(reduction, x, axes, trim_excess=False) so a proposed xarray API could look like: xarray.coarsen(reduction, x, axes, chunks=None, trim_excess=False), resulting in the following implementation: 1. If the underlying data to x is dask.array, yields x.chunks(chunks).array.coarsen(reduction, axes, trim_excess) 2. Else, copy the block_reduce function.

Does that fit with the xarray API?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.009ms · About: xarray-datasette