home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 297560256 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • RafalSkolasinski 5
  • shoyer 3
  • max-sixty 1
  • fujiisoup 1
  • basnijholt 1
  • jcmgray 1

author_association 3

  • NONE 6
  • MEMBER 5
  • CONTRIBUTOR 1

issue 1

  • cartesian product of coordinates and using it to index / fill empty dataset · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
435717326 https://github.com/pydata/xarray/issues/1914#issuecomment-435717326 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDQzNTcxNzMyNg== RafalSkolasinski 10928117 2018-11-04T23:07:56Z 2018-11-04T23:07:56Z NONE

@jcmgray I had to miss your reply to this issue, I saw it just now. I love your code! I will definitely include xyzpy in my tools from now on ;-).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
396745650 https://github.com/pydata/xarray/issues/1914#issuecomment-396745650 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM5Njc0NTY1MA== jcmgray 8982598 2018-06-12T21:48:31Z 2018-06-12T22:40:31Z CONTRIBUTOR

Indeed, this is exactly the kind of situation I wrote xyzpy for. As a quick demo:

```python import numpy as np import xyzpy as xyz

def some_function(x, y, z): return x * np.random.randn(3, 4) + y / z

Define how to label the function's output

runner_opts = { 'fn': some_function, 'var_names': ['output'], 'var_dims': {'output': ['a', 'b']}, 'var_coords': {'a': [10, 20, 30]}, } runner = xyz.Runner(**runner_opts)

set the parameters we want to explore (combos <-> cartesian product)

combos = { 'x': np.linspace(1, 2, 11), 'y': np.linspace(2, 3, 21), 'z': np.linspace(4, 5, 31), }

run them

runner.run_combos(combos) ```

Should produce: ``` 100%|###################| 7161/7161 [00:00<00:00, 132654.11it/s]

<xarray.Dataset> Dimensions: (a: 3, b: 4, x: 11, y: 21, z: 31) Coordinates: * x (x) float64 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 * y (y) float64 2.0 2.05 2.1 2.15 2.2 2.25 2.3 2.35 2.4 2.45 2.5 ... * z (z) float64 4.0 4.033 4.067 4.1 4.133 4.167 4.2 4.233 4.267 4.3 ... * a (a) int32 10 20 30 Dimensions without coordinates: b Data variables: output (x, y, z, a, b) float64 0.6942 -0.3348 -0.9156 -0.517 -0.834 ... ```

And there are options for merging successive, disjoint sets of data (combos2, combos3, ...) and parallelizing/distributing the work.

There are also multiple ways to define functions inputs/outputs (the easiest of which is just to actually return a xr.Dataset), but do let me know if your use case is beyond them or unclear.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
396738702 https://github.com/pydata/xarray/issues/1914#issuecomment-396738702 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM5NjczODcwMg== shoyer 1217238 2018-06-12T21:23:09Z 2018-06-12T21:23:09Z MEMBER

xyzpy (by @jcmgray) looks like it might be a nice way to solve this problem, e.g., see http://xyzpy.readthedocs.io/en/latest/examples/complex%20output%20example.html

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
396737241 https://github.com/pydata/xarray/issues/1914#issuecomment-396737241 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM5NjczNzI0MQ== basnijholt 6897215 2018-06-12T21:18:18Z 2018-06-12T21:18:18Z NONE

This StackOverflow question is related to this "issue".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
367677038 https://github.com/pydata/xarray/issues/1914#issuecomment-367677038 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2NzY3NzAzOA== RafalSkolasinski 10928117 2018-02-22T13:15:11Z 2018-02-22T13:15:11Z NONE

@shoyer Thanks for your suggestions and linking the other issue. I think this one can also be labelled as the "usage question".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
367578341 https://github.com/pydata/xarray/issues/1914#issuecomment-367578341 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2NzU3ODM0MQ== shoyer 1217238 2018-02-22T06:13:58Z 2018-02-22T06:13:58Z MEMBER

This issue has brought up a lot of the same issues: https://github.com/pydata/xarray/issues/1773

Clearly, we need better documentation here at the very least.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366884882 https://github.com/pydata/xarray/issues/1914#issuecomment-366884882 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2Njg4NDg4Mg== shoyer 1217238 2018-02-20T07:02:37Z 2018-02-20T07:02:37Z MEMBER

xarray.broadcast() could also be helpful for generating a cartesian product. Something like xarray.broadcast(*data.coords.values()) would get you three 3D DataArray objects.

apply_ufunc with vectorize=True could also achieve what you're looking for here: ```python import xarray as xr import numpy as np

data = xr.Dataset(coords={'x': np.linspace(-1, 1), 'y': np.linspace(0, 10), 'a': 1, 'b': 5})

def some_function(x, y): return float(x) * float(y)

xr.apply_ufunc(some_function, data['x'], data['y'], vectorize=True) Results in: <xarray.DataArray (x: 50, y: 50)> array([[ -0. , -0.204082, -0.408163, ..., -9.591837, -9.795918, -10. ], [ -0. , -0.195752, -0.391504, ..., -9.200333, -9.396085, -9.591837], [ -0. , -0.187422, -0.374844, ..., -8.80883 , -8.996252, -9.183673], ..., [ 0. , 0.187422, 0.374844, ..., 8.80883 , 8.996252, 9.183673], [ 0. , 0.195752, 0.391504, ..., 9.200333, 9.396085, 9.591837], [ 0. , 0.204082, 0.408163, ..., 9.591837, 9.795918, 10. ]]) Coordinates: * x (x) float64 -1.0 -0.9592 -0.9184 -0.8776 -0.8367 -0.7959 ... a int64 1 b int64 5 * y (y) float64 0.0 0.2041 0.4082 0.6122 0.8163 1.02 1.224 1.429 ... ```

You can even do this with dask arrays if you set dask='parallelized'.

That said, it does feel like there's some missing functionality here for the xarray equivalent of ndenumerate. I'm not entirely sure what the right API is, yet.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366833780 https://github.com/pydata/xarray/issues/1914#issuecomment-366833780 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2NjgzMzc4MA== RafalSkolasinski 10928117 2018-02-20T00:27:36Z 2018-02-20T00:27:36Z NONE

After preparing list similar to [{'x': 0, 'y': 'a'}, {'x': 1, 'y': 'a'}, ...] interaction with cluster is quite efficient. One can easily pass such a thing to async_map of ipyparallel.

Thanks for your suggestion, I need to try few things. I also want to try to extend it to function that computes few different things that could be multi-valued, e.g. ```python def dummy(x, y): ds = xr.Dataset( {'out1': ('n', [1x, 2x, 3*x]), 'out2': ('m', [x, y])}, coords = {'x': x, 'y': y, 'n': range(3), 'm': range(2)} )

return ds

``` and then group together such outputs... Ok, I know. I go from simple problem to much more complicated one, but isn't it the case usually?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366825366 https://github.com/pydata/xarray/issues/1914#issuecomment-366825366 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2NjgyNTM2Ng== fujiisoup 6815844 2018-02-19T23:21:05Z 2018-02-19T23:34:58Z MEMBER

I am not sure if it is efficient to interact with a cluster, but I often use MultiIndex for make a cartesian product, ```python In [1]: import xarray as xr ...: import numpy as np ...: data = xr.DataArray(np.full((3, 4), np.nan), dims=('x', 'y'), ...: coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']}) ...: ...: data ...: Out[1]: <xarray.DataArray (x: 3, y: 4)> array([[ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan]]) Coordinates: * x (x) int64 0 1 2 * y (y) <U1 'a' 'b' 'c' 'd'

In [2]: data1 = data.stack(xy=['x', 'y']) ...: data1 ...: Out[2]: <xarray.DataArray (xy: 12)> array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]) Coordinates: * xy (xy) MultiIndex - x (xy) int64 0 0 0 0 1 1 1 1 2 2 2 2 - y (xy) object 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' For the above example, `data` becomes 1-dimensional with coordinate `xy`, where `xy` is a product of `x` and `y`. Each entry of `xy` is tuple of 'x' and 'y' value,python In [3]: data1[0] Out[3]: <xarray.DataArray ()> array(np.nan) Coordinates: xy object (0, 'a') and we can assign a value for given coordinate values by `loc` method,python In [5]: # Assuming we found the result with (1, 'a') is 2.0 ...: data1.loc[(1, 'a'), ] = 2.0

In [6]: data1 Out[6]: <xarray.DataArray (xy: 12)> array([ nan, nan, nan, nan, 2., nan, nan, nan, nan, nan, nan, nan]) Coordinates: * xy (xy) MultiIndex - x (xy) int64 0 0 0 0 1 1 1 1 2 2 2 2 - y (xy) object 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' 'a' 'b' 'c' 'd' ```

Note that we need to access via data1.loc[(1, 'a'), ], rather than data1.loc[(1, 'a')] (last comma in the bracket is needed.)

EDIT: I modified my previous comment to take the partial assignment into accout.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366819497 https://github.com/pydata/xarray/issues/1914#issuecomment-366819497 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2NjgxOTQ5Nw== RafalSkolasinski 10928117 2018-02-19T22:40:17Z 2018-02-19T22:58:02Z NONE

For "get done" I had for example the following (similar to what I linked as my initial attempt) ```python coordinates = { 'x': np.linspace(-1, 1), 'y': np.linspace(0, 10), }

constants = { 'a': 1, 'b': 5 }

inps = [{constants, {k: v for k, v in zip(coordinates.keys(), x)}} for x in list(it.product(*coordinates.values()))]

def f(x, y, a, b): """Some dummy function.""" v = a * x2 + b * y2 return xr.DataArray(v, {'x': x, 'y': y, 'a': a, 'b': b})

simulate computation on cluster

values = list(map(lambda s: f(**s), inps))

gather and unstack the inputs

ds = xr.concat(values, dim='new', coords='all') ds = ds.set_index(new=list(set(ds.coords) - set(ds.dims))) ds = ds.unstack('new') ```

It is very close to what you suggest. My main question is if this can be done better. Mainly I am wondering if 1. Is there any built-in iterator over the Cartesian product of coordinates. If no, are there people that also think it would be useful? 2. Gathering together / unstacking of the data. My 3 line combo of concat, set_index and unstack seems to do the trick but it seems a bit like over complication. Ideally I'd expect to have some mechanism that works similar to:

`python inputs = cartesian_product(coordinates) # list similar toinps`` above values = [function(inp) for inp in inputs] # or using ipypparallel map

xarray_data = ... # some empty xarray object for inp, val in zip(inputs, values): xarray_data[inp] = val ```

I asked how to generate product of coordinates from xarray object because I was expecting that I can create xarray_data as an empty object with all coordinates set and then fill it.


Added comment

Having an empty, as filled with nans, object to start with would have this benefit that one could save partial results and have clean information what was already computed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366791162 https://github.com/pydata/xarray/issues/1914#issuecomment-366791162 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2Njc5MTE2Mg== max-sixty 5635139 2018-02-19T20:05:53Z 2018-02-19T20:05:53Z MEMBER

I think that this shouldn't be too hard to 'get done' but also that xarray may not give you much help natively. (I'm not sure though, so take this as hopefully helpful contribution rather than a definitive answer)

Specifically, can you do (2) by generating a product of the coords? Either using numpy, stacking, or some simple python:

```python

In [3]: list(product(*((data[x].values) for x in data.dims))) Out[3]: [(0.287706062977495, 0.065327131503921), (0.287706062977495, 0.17398282388217068), (0.287706062977495, 0.1455022501442349), (0.42398126102299216, 0.065327131503921), (0.42398126102299216, 0.17398282388217068), (0.42398126102299216, 0.1455022501442349), (0.13357153947234057, 0.065327131503921), (0.13357153947234057, 0.17398282388217068), (0.13357153947234057, 0.1455022501442349), (0.42347765161572537, 0.065327131503921), (0.42347765161572537, 0.17398282388217068), (0.42347765161572537, 0.1455022501442349)] ```

then distribute those out to a cluster if you need, and then unstack them back into a dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
366740505 https://github.com/pydata/xarray/issues/1914#issuecomment-366740505 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM2Njc0MDUwNQ== RafalSkolasinski 10928117 2018-02-19T16:20:15Z 2018-02-19T16:23:31Z NONE

Let me give a bit of a background what I would like to do:

  1. Create an empty Dataset of coordinates I want to explore, i.e. two np.arrays x and y, and two scalars a and b.
  2. Generate an list of the Cartesian product of all the coordinates, i.e. [ {'x': -1, 'y': 0, 'a': 1, 'b': 5}, ...] (data format doesn't really matter).
  3. For each item of the iterator compute some function: f = f(x, y, a, b). In principle this function can be expensive to compute, therefore I'd compute it for each item of list from 2. separately on the cluster.
  4. "merge" it all together into a single xarray object

In principle f should be allowed to return e.g. np.array. An related issue in holoviews and the notebook with my initial attempt. In the linked notebook I managed to achieve the goal however without starting with an xarray object containing coordinates. Also combining the data seems a bit inefficient as it takes more time than generating it for a larger datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 801.49ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows