home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where author_association = "MEMBER" and issue = 484752930 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • dcherian 4
  • mrocklin 3
  • shoyer 2
  • crusaderky 2

issue 1

  • [WIP] Add map_blocks. · 11 ✖

author_association 1

  • MEMBER · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
529168271 https://github.com/pydata/xarray/pull/3258#issuecomment-529168271 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyOTE2ODI3MQ== dcherian 2448579 2019-09-08T04:20:19Z 2019-09-08T04:20:19Z MEMBER

Closing in favour of #3276

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
527187603 https://github.com/pydata/xarray/pull/3258#issuecomment-527187603 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNzE4NzYwMw== mrocklin 306380 2019-09-02T15:37:18Z 2019-09-02T15:37:18Z MEMBER

I'm glad to see progress here. FWIW, I think that many people would be quite happy with a version that just worked for DataArrays, in case that's faster to get in than the full solution with DataSets.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
527186872 https://github.com/pydata/xarray/pull/3258#issuecomment-527186872 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNzE4Njg3Mg== dcherian 2448579 2019-09-02T15:34:21Z 2019-09-02T15:34:21Z MEMBER

Thanks. That worked. I have a new version up in #3276 that works with both DataArrays and Datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
526756738 https://github.com/pydata/xarray/pull/3258#issuecomment-526756738 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNjc1NjczOA== mrocklin 306380 2019-08-30T21:31:49Z 2019-08-30T21:32:02Z MEMBER

Then you can construct a tuple as a task (1, 2, 3) -> (tuple, [1, 2, 3])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
526751676 https://github.com/pydata/xarray/pull/3258#issuecomment-526751676 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNjc1MTY3Ng== dcherian 2448579 2019-08-30T21:11:28Z 2019-08-30T21:11:28Z MEMBER

Thanks @mrocklin. Unfortunately that doesn't work with the Dataset constructor. With a list it treats it as array-like

``` The following notations are accepted:

- mapping {var name: DataArray}
- mapping {var name: Variable}
- mapping {var name: (dimension name, array-like)}
- mapping {var name: (tuple of dimension names, array-like)}
- mapping {dimension name: array-like}
  (it will be automatically moved to coords, see below)

```

Unless @shoyer has another idea, I guess I can insert creating a DataArray into the graph and then refer to those keys in the Dataset constructor.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525966384 https://github.com/pydata/xarray/pull/3258#issuecomment-525966384 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTk2NjM4NA== mrocklin 306380 2019-08-28T23:54:48Z 2019-08-28T23:54:48Z MEMBER

Dask doesn't traverse through tuples to find possible keys, so the keys here are hidden from view:

python {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)),

I recommend changing wrapping tuples with lists:

diff - {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), + {'a': [('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)],

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525965607 https://github.com/pydata/xarray/pull/3258#issuecomment-525965607 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTk2NTYwNw== dcherian 2448579 2019-08-28T23:51:43Z 2019-08-28T23:53:46Z MEMBER

I started prototyping a Dataset version. Here's what I have:

``` python import dask import xarray as xr

darray = xr.DataArray(np.ones((10, 20)), dims=['x', 'y'], coords={'x': np.arange(10), 'y': np.arange(100, 120)}) dset = darray.to_dataset(name='a') dset['b'] = dset.a + 50 dset['c'] = (dset.x + 20) dset = dset.chunk({'x': 4, 'y': 5}) ```

The function I'm applying takes a dataset and returns a DataArray because that's easy to test without figuring out how to assemble everything back into a dataset. ``` python import itertools

function takes dataset and returns dataarray so that I can check that things work without reconstructing a dataset

def function(ds): return ds.a + 10

dataset_dims = list(dset.dims)

graph = {} gname = 'dsnew'

map dims to list of chunk indexes

If different variables have different chunking along the same dim

the call to .chunks will raise an error.

ichunk = {dim: range(len(dset.chunks[dim])) for dim in dataset_dims}

iterate over all possible chunk combinations

for v in itertools.product(*ichunk.values()): chunk_index_dict = dict(zip(dataset_dims, v)) data_vars = {} for name, variable in dset.data_vars.items(): # why do does dask_keys have an extra level? # the [0] is not required for dataarrays var_dask_keys = variable.dask_keys()[0]

    # recursively index into dask_keys nested list
    chunk = var_dask_keys
    for dim in variable.dims:
        chunk = chunk[chunk_index_dict[dim]]

    # I have key corresponding to chunk
    # this tuple is in a dictionary passed to xr.Dataset()
    # dask doesn't seem to replace this with a numpy array at execution time.
    data_vars[name] = (variable.dims, chunk)

graph[(gname, ) + v] = (function, (xr.Dataset, data_vars))

final_graph = dask.highlevelgraph.HighLevelGraph.from_collections(name, graph, dependencies=[dset]) ```

Elements of the graph look like ('dsnew', 0, 0): (<function __main__.function(ds)>, (xarray.core.dataset.Dataset, {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), 'b': (('x', 'y'), ('xarray-b-e2d8d06bb9e5c1f351671a94816bd331', 0, 0)), 'c': (('x',), ('xarray-c-d90f8b2af715b53f4c170be391239655', 0))}))

This doesn't work because dask doesn't replace the keys by numpy arrays when the xr.Dataset call is executed.

result = dask.array.Array(final_graph, name=gname, chunks=dset.a.data.chunks, meta=dset.a.data._meta) dask.compute(result)

ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)) to Variable.

The graph is "disconnected":

I'm not sure what I'm doing wrong here. An equivalent version for DataArrays works perfectly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525427300 https://github.com/pydata/xarray/pull/3258#issuecomment-525427300 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTQyNzMwMA== shoyer 1217238 2019-08-27T18:30:34Z 2019-08-27T18:30:34Z MEMBER

apply_ufunc is extremely powerful, and when you need to cope with all possible shape transformations, I suspect its verbosity is quite necessary. It's just that, when all you need to do is apply an elementwise, embarassingly parallel function (80% of the times in my real life experience), apply_ufunc is overkill.

Yes, 100% agreed! There is a real need for a simpler version of apply_ufunc.

The thing I have against the name map_blocks is that backends other than dask have no notion of blocks...

I think the functionality in this PR is fundamentally dask specific. We shouldn't make a habit of adding backend specific features, but it makes sense in limited cases.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525425560 https://github.com/pydata/xarray/pull/3258#issuecomment-525425560 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTQyNTU2MA== crusaderky 6213168 2019-08-27T18:26:17Z 2019-08-27T18:26:17Z MEMBER

@shoyer let me rephrase it - apply_ufunc is extremely powerful, and when you need to cope with all possible shape transformations, I suspect its verbosity is quite necessary. It's just that, when all you need to do is apply an elementwise, embarassingly parallel function (80% of the times in my real life experience), apply_ufunc is overkill.

The thing I have against the name map_blocks is that backends other than dask have no notion of blocks...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525384446 https://github.com/pydata/xarray/pull/3258#issuecomment-525384446 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTM4NDQ0Ng== shoyer 1217238 2019-08-27T16:40:32Z 2019-08-27T16:40:32Z MEMBER
  • could we call it just "map"? It makes sense as this thing would be very useful for non-dask based arrays too. Working routinely with scipy (chiefly with scipy.stats transforms), I tire a lot of writing very verbose xarray.apply_ufunc calls.

I agree that apply_ufunc is overly verbose. See https://github.com/pydata/xarray/issues/1074 and https://github.com/pydata/xarray/issues/1618 (and issues linked therein) for discussion about alternative APIs.

I still think this particular set of functionality should be called map_blocks, because it works by applying functions over each block, very similar to dask's map_blocks.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525298264 https://github.com/pydata/xarray/pull/3258#issuecomment-525298264 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTI5ODI2NA== crusaderky 6213168 2019-08-27T13:21:44Z 2019-08-27T13:21:44Z MEMBER

Hi,

A few design opinions:

  1. could we call it just "map"? It makes sense as this thing would be very useful for non-dask based arrays too. Working routinely with scipy (chiefly with scipy.stats transforms), I tire a lot of writing very verbose xarray.apply_ufunc calls.

  2. could we have it as a method of DataArray and Dataset, to allow for method chaining?

e.g. python myarray.map(func1).chunk().map(func2).sum().compute()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 242.124ms · About: xarray-datasette