home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where issue = 144683276 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • pwolfram 10
  • shoyer 6
  • fmaussion 1

author_association 2

  • CONTRIBUTOR 10
  • MEMBER 7

issue 1

  • Selection based on boolean DataArray · 17 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
210632083 https://github.com/pydata/xarray/issues/811#issuecomment-210632083 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIxMDYzMjA4Mw== pwolfram 4295853 2016-04-15T20:30:03Z 2016-04-15T20:30:03Z CONTRIBUTOR

Closing because of #815 merge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
204502189 https://github.com/pydata/xarray/issues/811#issuecomment-204502189 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwNDUwMjE4OQ== pwolfram 4295853 2016-04-01T18:23:41Z 2016-04-01T18:23:41Z CONTRIBUTOR

Ok, sounds good @shoyer. Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
204499580 https://github.com/pydata/xarray/issues/811#issuecomment-204499580 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwNDQ5OTU4MA== shoyer 1217238 2016-04-01T18:13:52Z 2016-04-01T18:13:52Z MEMBER

Also, just for future reference usually we close feature requests when the corresponding PR has been merged :)

On Fri, Apr 1, 2016 at 11:04 AM, Phillip Wolfram notifications@github.com wrote:

Closed #811 https://github.com/pydata/xarray/issues/811.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pydata/xarray/issues/811#event-611609243

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
204496358 https://github.com/pydata/xarray/issues/811#issuecomment-204496358 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwNDQ5NjM1OA== pwolfram 4295853 2016-04-01T18:04:18Z 2016-04-01T18:04:18Z CONTRIBUTOR

The implementation turned out to be a little different that proposed @shoyer, please let me know what you think of #815. The syntax works for my use case, e.g., acase.sel_where(idx). I'm going to close this issue out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203691509 https://github.com/pydata/xarray/issues/811#issuecomment-203691509 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY5MTUwOQ== shoyer 1217238 2016-03-31T00:08:39Z 2016-03-31T00:08:39Z MEMBER

If I'm understanding you correctly, the use case is to trim off all NA regions when using where for selection. If your mask is not perfectly rectangular, some values will still be replaced with NA.

A starting point would be something like this:

python def sel_where(data, mask): data, mask = xr.broadcast(xr.align(data, mask, join='left', copy=False)) # possibly not a good idea to expand mask to the full dimensions of the data indexes = tuple(mask.any(dim).values for dim in mask.dims) return data[indexes].where(mask[indexes])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203690278 https://github.com/pydata/xarray/issues/811#issuecomment-203690278 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY5MDI3OA== pwolfram 4295853 2016-03-31T00:00:35Z 2016-03-31T00:00:47Z CONTRIBUTOR

The exampling mapping would then be

acase.sel_where(idx) -> acase.sel(x=np.where(idx)[0], y=np.where(idx)[1])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203689249 https://github.com/pydata/xarray/issues/811#issuecomment-203689249 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY4OTI0OQ== pwolfram 4295853 2016-03-30T23:54:03Z 2016-03-30T23:54:03Z CONTRIBUTOR

Sorry about the poor word choice, what I really mean by "contraction" is "slice". The example below will hopefully demonstrate. I think sel_from or sel_where is probably better anyway. The idea is to provide boolean indexing a la http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html. In this case though, if a dimension is not used assume it is broadcast to preserve the dimension similar to orthogonal indexing. Essentially, I'm looking for a where mask that simply removes array entries instead of replacing them with nan. Thus, the method could also be called sel_where.

Pseudo code is below to demonstrate key feature of method

```

assume idx is boolean and

when True represents values that should be returned by a slice like behavior

let idx dims be ('x' and 'y')

$ idx.shape

(1000,1000) $ np.sum(idx.values,axis=0) 10 $ np.sum(idx.values,axis=1) 20 $ acase.shape (1000,1000) $ slicedcase = acase.sel_where(idx) $ slicedcase.shape (10, 20)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203676582 https://github.com/pydata/xarray/issues/811#issuecomment-203676582 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY3NjU4Mg== pwolfram 4295853 2016-03-30T23:08:14Z 2016-03-30T23:46:02Z CONTRIBUTOR

Agreed. I think a sel_like method would work well to mirror the reindex/reindex_like notation.

To be clear for people listening for the distinction between this use case and that covered by reindex/reindex_like can be summarized as follows:

reindex_like does not do a contraction on whether the dataArray idx is True or False. There are many use cases where one will want to reduce the size of the data file (e.g., subseting a region from global climate model data). reindex_like preserves the shape of idx and not its collapsed shape based on a np.sum over each axis. A sel_like method will perform the contraction and the size that should be returned from idx along each axis should will be np.sum(idx,axis=axis) not just idx.shape[axis] as would occur for reindex_like.

@shoyer, are you in support of a PR that implements a sel_like to do a mapping of the form acase.sel_like(idx) -> acase.sel(x=idx.x.values, y=idx.y.values)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203678531 https://github.com/pydata/xarray/issues/811#issuecomment-203678531 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY3ODUzMQ== shoyer 1217238 2016-03-30T23:18:32Z 2016-03-30T23:18:32Z MEMBER

I'm still not sure exactly what you mean by this "contraction" like behavior. Could you write this out in pseudo code?

This is pretty different from .sel and not in the way that reindex_like is different from reindex, so possibly another name would be appropriate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203662498 https://github.com/pydata/xarray/issues/811#issuecomment-203662498 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY2MjQ5OA== shoyer 1217238 2016-03-30T22:11:35Z 2016-03-30T22:11:35Z MEMBER

I would rather make a sel_like method for remapping acase.sel_like(idx) -> acase.sel(x=idx.x.values, y=idx.y.values). That would mirror the current reindex/reindex_like distinction, though reindex_like does solve many of these use cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203639698 https://github.com/pydata/xarray/issues/811#issuecomment-203639698 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYzOTY5OA== pwolfram 4295853 2016-03-30T21:17:32Z 2016-03-30T21:17:32Z CONTRIBUTOR

I would argue that the intuitive behavior, even if it is a departure from numpy, is to make acase.sel(idx) be short hand for acase.sel(x=idx.x.values, y=idx.y.values) which would preserve the original dimensionality, but not necessarily the shape of acase because if a dimension is all False then that dimension should be dropped completely. This is because the sel method essentially operates on key-value pairs, or dictionary-like data.

I don't thing acase[idx] would make as much sense because this is closer to standard numpy notation where acase has two dimensions and it is possible that the dimensional arangement for acase and idx don't match (e.g., they could dimensionally be transposes of each other). If I understand correctly, arguments to the [] operator should essentially be data types, not a dictionary type. So this would be consistent because we can think of idx as a glorified dict with metadata whereas it is not a pure number type. Some type of dereferrencing operation is needed for the "dict" idx to become a data type. A carefully worded error should probably be returned here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203633885 https://github.com/pydata/xarray/issues/811#issuecomment-203633885 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYzMzg4NQ== shoyer 1217238 2016-03-30T20:57:24Z 2016-03-30T20:57:24Z MEMBER

If acase and idx (a boolean) both have dimensions ('x', 'y'), what should acase.sel(idx) or acase[idx] do?

NumPy will return a 1D flattened array in this case. We could do that (with a MultiIndex) but that's not so useful in xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203631503 https://github.com/pydata/xarray/issues/811#issuecomment-203631503 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYzMTUwMw== pwolfram 4295853 2016-03-30T20:49:52Z 2016-03-30T20:49:52Z CONTRIBUTOR

Thanks @shoyer. If idx has more than one dimension couldn't we just select along the dimensions that are shared between acase and idx and just broadcast for the others, returning all entries for dimensions in acase that are not in idx? I sense I'm potentially missing something subtle here.

The bottom line is that if we could do a non-1D selection that would allow for more concise syntax on the application side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203628130 https://github.com/pydata/xarray/issues/811#issuecomment-203628130 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYyODEzMA== shoyer 1217238 2016-03-30T20:44:34Z 2016-03-30T20:44:34Z MEMBER

You can write acase.where(idx), but we don't support acase.sel(idx) because it's not clear what do to if idx has more than one dimension.

I suppose we could allow acase.sel(idx1, idx2, ...) if all the provided arguments are 1D.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203622523 https://github.com/pydata/xarray/issues/811#issuecomment-203622523 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYyMjUyMw== pwolfram 4295853 2016-03-30T20:33:56Z 2016-03-30T20:33:56Z CONTRIBUTOR

Thanks for clarifying this @fmaussion! You are right that the shape of idx is (Nr: 1, Np: 92000) and that is why it fails.

However, the reason I originally filed this out is because there seems to be some clarification in syntax that could be done here. For example, should we be able to do something like acase.sel(idx) where the indices from idx are used to select acase? It seems like there may be a way to streamline this in a more intuitive way. If so, this is probably a new issue and this one should be closed because its original formulation, as you so helpfully point out, is flawed. Thoughts on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203615938 https://github.com/pydata/xarray/issues/811#issuecomment-203615938 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYxNTkzOA== fmaussion 10050469 2016-03-30T20:23:03Z 2016-03-30T20:23:03Z MEMBER

Should xarray indexing account for boolean values without resorting to a call to np.where?

as far as I know, it does:

python In [1]: import xarray as xr In [2]: import numpy as np In [3]: da = xr.DataArray(np.arange(10), coords={'time':np.arange(10)}) In [4]: da.sel(time=da.time > 4) Out[4]: <xarray.DataArray (time: 5)> array([5, 6, 7, 8, 9]) Coordinates: * time (time) int64 5 6 7 8 9

But according to the traceback it seems to have something to do with the shape or your array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203571994 https://github.com/pydata/xarray/issues/811#issuecomment-203571994 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzU3MTk5NA== pwolfram 4295853 2016-03-30T18:39:12Z 2016-03-30T18:39:12Z CONTRIBUTOR

More detail on the genesis of this issue:

acase.sel(Np=np.where(idx)[0]) works but acase.sel(Np=idx) does not returning the following error:

``` In [73]: acase.sel(Np=idx)


ValueError Traceback (most recent call last) <ipython-input-73-abb4dc27ed02> in <module>() ----> 1 acase.sel(Np=idx)

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.pyc in sel(self, method, tolerance, indexers) 974 """ 975 return self.isel(indexing.remap_label_indexers( --> 976 self, indexers, method=method, tolerance=tolerance)) 977 978 def isel_points(self, dim='points', **indexers):

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in remap_label_indexers(data_obj, indexers, method, tolerance) 189 return dict((dim, convert_label_indexer(data_obj[dim].to_index(), label, 190 dim, method, tolerance)) --> 191 for dim, label in iteritems(indexers)) 192 193

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in <genexpr>((dim, label)) 189 return dict((dim, convert_label_indexer(data_obj[dim].to_index(), label, 190 dim, method, tolerance)) --> 191 for dim, label in iteritems(indexers)) 192 193

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in convert_label_indexer(index, label, index_name, method, tolerance) 168 'the index is unsorted or non-unique') 169 else: --> 170 label = _asarray_tuplesafe(label) 171 if label.ndim == 0: 172 indexer = index.get_loc(label.item(), **kwargs)

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in _asarray_tuplesafe(values) 131 if result.ndim == 2: 132 result = np.empty(len(values), dtype=object) --> 133 result[:] = values 134 135 return result

ValueError: could not broadcast input array from shape (92000) into shape (1)

```

with

``` In [76]: acase

Out[76]:

<xarray.Dataset> Dimensions: (Nb: 11, Np: 92000, Nr: 1, Nt-1: 27, Time: 28) Coordinates: yearoffset |S4 '1700' * Nb (Nb) float64 1.028e+03 1.029e+03 1.029e+03 1.029e+03 ... rlzn (Nr) int64 0 time (Time) datetime64[ns] 1724-01-01 1724-01-02 1724-01-03 ... * Np (Np) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... * Nr (Nr) int64 0 * Nt-1 (Nt-1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... * Time (Time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Data variables: lat (Nr, Time, Nb, Np) float64 8.66e+03 8.66e+03 8.66e+03 ... notoutcropped (Nr, Time, Nb, Np) int64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... lon (Nr, Time, Nb, Np) float64 5e+03 1e+04 1.5e+04 2e+04 ... dtdays (Nr, Nt-1) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... ```

``` In [75]: idx

Out[75]:

<xarray.DataArray (Nr: 1, Np: 92000)> array([[ True, True, True, ..., False, False, False]], dtype=bool) Coordinates: yearoffset |S4 '1700' Nb float64 1.029e+03 rlzn (Nr) int64 0 time datetime64[ns] 1724-01-01 * Np (Np) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... * Nr (Nr) int64 0 Time int64 0

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.492ms · About: xarray-datasette