home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "CONTRIBUTOR" and user = 8982598 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Implement DataArray.idxmax() 4
  • merge and align DataArrays/Datasets on different domains 4
  • add 'no_conflicts' as compat option for merging non-conflicting data 3
  • Fixes for compat='no_conflicts' and open_mfdataset 1
  • cartesian product of coordinates and using it to index / fill empty dataset 1

user 1

  • jcmgray · 13 ✖

author_association 1

  • CONTRIBUTOR · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
396745650 https://github.com/pydata/xarray/issues/1914#issuecomment-396745650 https://api.github.com/repos/pydata/xarray/issues/1914 MDEyOklzc3VlQ29tbWVudDM5Njc0NTY1MA== jcmgray 8982598 2018-06-12T21:48:31Z 2018-06-12T22:40:31Z CONTRIBUTOR

Indeed, this is exactly the kind of situation I wrote xyzpy for. As a quick demo:

```python import numpy as np import xyzpy as xyz

def some_function(x, y, z): return x * np.random.randn(3, 4) + y / z

Define how to label the function's output

runner_opts = { 'fn': some_function, 'var_names': ['output'], 'var_dims': {'output': ['a', 'b']}, 'var_coords': {'a': [10, 20, 30]}, } runner = xyz.Runner(**runner_opts)

set the parameters we want to explore (combos <-> cartesian product)

combos = { 'x': np.linspace(1, 2, 11), 'y': np.linspace(2, 3, 21), 'z': np.linspace(4, 5, 31), }

run them

runner.run_combos(combos) ```

Should produce: ``` 100%|###################| 7161/7161 [00:00<00:00, 132654.11it/s]

<xarray.Dataset> Dimensions: (a: 3, b: 4, x: 11, y: 21, z: 31) Coordinates: * x (x) float64 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 * y (y) float64 2.0 2.05 2.1 2.15 2.2 2.25 2.3 2.35 2.4 2.45 2.5 ... * z (z) float64 4.0 4.033 4.067 4.1 4.133 4.167 4.2 4.233 4.267 4.3 ... * a (a) int32 10 20 30 Dimensions without coordinates: b Data variables: output (x, y, z, a, b) float64 0.6942 -0.3348 -0.9156 -0.517 -0.834 ... ```

And there are options for merging successive, disjoint sets of data (combos2, combos3, ...) and parallelizing/distributing the work.

There are also multiple ways to define functions inputs/outputs (the easiest of which is just to actually return a xr.Dataset), but do let me know if your use case is beyond them or unclear.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  cartesian product of coordinates and using it to index / fill empty dataset 297560256
276540506 https://github.com/pydata/xarray/issues/60#issuecomment-276540506 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjU0MDUwNg== jcmgray 8982598 2017-02-01T00:43:52Z 2017-02-01T00:43:52Z CONTRIBUTOR

Would using obj.fillna(0) not mess with argmax if for instance all the data is negative? Could fill with the min value instead?

Ah yes true. I was slightly anticipating e.g. filling with NaT if the dim was time-like, though time types are not something I am familiar with.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
276537615 https://github.com/pydata/xarray/issues/60#issuecomment-276537615 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjUzNzYxNQ== jcmgray 8982598 2017-02-01T00:26:24Z 2017-02-01T00:26:24Z CONTRIBUTOR

Ah yes both ways are working now, thanks. Just had a little play around with timings, and this seems like a reasonably quick way to achieve correct NaN behaviour:

```python def xr_idxmax(obj, dim): sig = ([(dim,), (dim,)], [()]) kwargs = {'axis': -1}

allna = obj.isnull().all(dim)

return apply_ufunc(gufunc_idxmax, obj.fillna(-np.inf), obj[dim],
                   signature=sig, kwargs=kwargs,
                   dask_array='allowed').where(~allna).fillna(np.nan)

```

i.e. originally replace all NaN values with -Inf, use the usual argmax, and remask the all-NaN values afterwards.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
276232678 https://github.com/pydata/xarray/issues/60#issuecomment-276232678 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NjIzMjY3OA== jcmgray 8982598 2017-01-31T00:06:02Z 2017-01-31T00:06:02Z CONTRIBUTOR

So I thought take was just the functional equivalent of fancy indexing - I spotted it in the dask api and assumed it would work but having tried it does indeed just raise a 'not implemented error'. Just as a note, with the map_blocks approach above take is working for some cases where x[inds, ] is not -- related to #1237?

Regarding edge cases: multiple maxes is presumably fine as long as user is aware it just takes the first. However, nanargmax is probably the actual desired function here, but looks like it will raise on all-nan slices. Would dropping these and then re-aligning be too much overhead?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
275778443 https://github.com/pydata/xarray/issues/60#issuecomment-275778443 https://api.github.com/repos/pydata/xarray/issues/60 MDEyOklzc3VlQ29tbWVudDI3NTc3ODQ0Mw== jcmgray 8982598 2017-01-27T21:24:31Z 2017-01-27T21:24:31Z CONTRIBUTOR

Just as I am interested in having this functionality, and the new apply_ufunc is available, would something along these lines suffice?

```python from wherever import argmax, take # numpy or dask

def gufunc_idxmax(x, y, axis=None): indx = argmax(x, axis) return take(y, indx)

def idxmax(obj, dim): sig = ([(dim,), (dim,)], [()]) kwargs = {'axis': -1} return apply_ufunc(gufunc_idxmin, obj, obj[dim], signature=sig, kwargs=kwargs, dask_array='allowed') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement DataArray.idxmax() 29136905
248380693 https://github.com/pydata/xarray/pull/1007#issuecomment-248380693 https://api.github.com/repos/pydata/xarray/issues/1007 MDEyOklzc3VlQ29tbWVudDI0ODM4MDY5Mw== jcmgray 8982598 2016-09-20T17:57:31Z 2016-09-20T17:57:31Z CONTRIBUTOR

This all looks great to me. Might the docstring for auto_combine need to be updated? Currently it seems to suggest that sorting, alignment and the addition of new data variables are not supported, whereas I think all these can occur with compat='no_conflicts'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fixes for compat='no_conflicts' and open_mfdataset 177689088
246999780 https://github.com/pydata/xarray/pull/996#issuecomment-246999780 https://api.github.com/repos/pydata/xarray/issues/996 MDEyOklzc3VlQ29tbWVudDI0Njk5OTc4MA== jcmgray 8982598 2016-09-14T12:39:41Z 2016-09-14T12:39:41Z CONTRIBUTOR

OK, I have stripped the Dataset/Array methods which I agree were largely redundant. Since this sets this type of comparison/merge slightly apart, 'no_conflicts' seems the more intuitive wording when used only as a compat option so I've changed it to that.

And I've done a first pass at updating the docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add 'no_conflicts' as compat option for merging non-conflicting data 174404136
244612732 https://github.com/pydata/xarray/pull/996#issuecomment-244612732 https://api.github.com/repos/pydata/xarray/issues/996 MDEyOklzc3VlQ29tbWVudDI0NDYxMjczMg== jcmgray 8982598 2016-09-04T16:23:21Z 2016-09-04T16:23:21Z CONTRIBUTOR

One potential concern here is that performance is not going to be so great if you attempt to combine a bunch of variables with lazy data loaded with dask, because each comparison will trigger a separate computation. To that end, it would be nice to do the safety check in a single dask operation.

I will have a look into how to do this, but am not that familiar with dask.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add 'no_conflicts' as compat option for merging non-conflicting data 174404136
244612031 https://github.com/pydata/xarray/pull/996#issuecomment-244612031 https://api.github.com/repos/pydata/xarray/issues/996 MDEyOklzc3VlQ29tbWVudDI0NDYxMjAzMQ== jcmgray 8982598 2016-09-04T16:11:10Z 2016-09-04T16:11:10Z CONTRIBUTOR

Ah sorry - yes rebased locally then mistakenly merged the remote fork...

compat='no_conflicts' is possibly a better keyword argument.

Yes I thought that might be better also, the advantages of 'notnull_equals', (slightly more precise operational description and slightly better 'grammar' when using as a method: if ds1.notnull_equals(ds2): ... vs if ds1.no_conflicts(ds2): ...) are probably negligible since 'no_conflicts' is more intuitive for merge - and this is likely the main usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add 'no_conflicts' as compat option for merging non-conflicting data 174404136
242235696 https://github.com/pydata/xarray/issues/742#issuecomment-242235696 https://api.github.com/repos/pydata/xarray/issues/742 MDEyOklzc3VlQ29tbWVudDI0MjIzNTY5Ng== jcmgray 8982598 2016-08-24T23:05:49Z 2016-08-24T23:05:49Z CONTRIBUTOR

@shoyer My 2 cents for how this might work after 0.8+ (auto-align during concat, merge and auto_combine goes a long to solving this already) is that the compat option of merge etc could have a 4th option 'nonnull_equals' (or better named...), with compatibility tested by e.g.

``` python import xarray.ufuncs as xrufuncs

def nonnull_compatible(first, second): """ Check whether two (aligned) datasets have any conflicting non-null values. """

# mask for where both objects are not null
both_not_null = xrufuncs.logical_not(first.isnull() | second.isnull())

# check remaining values are equal
return first.where(both_not_null).equals(second.where(both_not_null))

```

And then fillna to combine variables. Looking now I think this is very similar to what you are suggesting in #835.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  merge and align DataArrays/Datasets on different domains 130753818
227573330 https://github.com/pydata/xarray/issues/742#issuecomment-227573330 https://api.github.com/repos/pydata/xarray/issues/742 MDEyOklzc3VlQ29tbWVudDIyNzU3MzMzMA== jcmgray 8982598 2016-06-21T21:11:21Z 2016-06-21T21:11:21Z CONTRIBUTOR

Woops - I actually meant to put

python ds['var'].loc[{...}]

in there as the one that works ... my understanding is that this is supported as long as the specified coordinates are 'nice' (according to pandas) slices/scalars.

And yes, default values for DataArray/Dataset would definitely fill the "create_all_missing" need.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  merge and align DataArrays/Datasets on different domains 130753818
226547071 https://github.com/pydata/xarray/issues/742#issuecomment-226547071 https://api.github.com/repos/pydata/xarray/issues/742 MDEyOklzc3VlQ29tbWVudDIyNjU0NzA3MQ== jcmgray 8982598 2016-06-16T16:57:48Z 2016-06-16T16:57:48Z CONTRIBUTOR

Yes following a similar line of thought to you I recently wrote an 'all missing' dataset constructor (rather than 'empty' which I think of as no variables):

python def all_missing_ds(coords, var_names, var_dims, var_types): """ Make a dataset whose data is all missing. """ # Empty dataset with appropirate coordinates ds = xr.Dataset(coords=coords) for v_name, v_dims, v_type in zip(var_names, var_dims, var_types): shape = tuple(ds[d].size for d in v_dims) if v_type == int or v_type == float: # Warn about up-casting int to float? nodata = np.tile(np.nan, shape) elif v_type == complex: # astype(complex) produces (nan + 0.0j) nodata = np.tile(np.nan + np.nan*1.0j, shape) else: nodata = np.tile(np.nan, shape).astype(object) ds[v_name] = (v_dims, nodata) return ds

To go with this (and this might be separate issue), a set_value method would be helpful --- just so that one does not have to remember which particular combination of

python ds.sel(...).var = new_values ds.sel(...)['var'] = new_values ds.var.sel(...) = new_values ds['var'].sel(...) = new_values

guarantees assigning a new value, (currently only the last syntax I believe).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  merge and align DataArrays/Datasets on different domains 130753818
226179313 https://github.com/pydata/xarray/issues/742#issuecomment-226179313 https://api.github.com/repos/pydata/xarray/issues/742 MDEyOklzc3VlQ29tbWVudDIyNjE3OTMxMw== jcmgray 8982598 2016-06-15T12:59:08Z 2016-06-15T12:59:08Z CONTRIBUTOR

Just a comment that the appearance of object types is likely due to the fact that numpy's NaNs are inherently 'floats' - so this will be an issue for any method with an intermediate `missing data' stage if non-floats are being used.

I still use use the align and fillna method since I mostly deal with floats/complex numbers, although @shoyer 's suggestion of a partial align and then concat could definitely be cleaner when the added coordinates are all 'new'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  merge and align DataArrays/Datasets on different domains 130753818

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.524ms · About: xarray-datasette