home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "CONTRIBUTOR" and issue = 144683276 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • pwolfram 10

issue 1

  • Selection based on boolean DataArray · 10 ✖

author_association 1

  • CONTRIBUTOR · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
210632083 https://github.com/pydata/xarray/issues/811#issuecomment-210632083 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIxMDYzMjA4Mw== pwolfram 4295853 2016-04-15T20:30:03Z 2016-04-15T20:30:03Z CONTRIBUTOR

Closing because of #815 merge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
204502189 https://github.com/pydata/xarray/issues/811#issuecomment-204502189 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwNDUwMjE4OQ== pwolfram 4295853 2016-04-01T18:23:41Z 2016-04-01T18:23:41Z CONTRIBUTOR

Ok, sounds good @shoyer. Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
204496358 https://github.com/pydata/xarray/issues/811#issuecomment-204496358 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwNDQ5NjM1OA== pwolfram 4295853 2016-04-01T18:04:18Z 2016-04-01T18:04:18Z CONTRIBUTOR

The implementation turned out to be a little different that proposed @shoyer, please let me know what you think of #815. The syntax works for my use case, e.g., acase.sel_where(idx). I'm going to close this issue out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203690278 https://github.com/pydata/xarray/issues/811#issuecomment-203690278 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY5MDI3OA== pwolfram 4295853 2016-03-31T00:00:35Z 2016-03-31T00:00:47Z CONTRIBUTOR

The exampling mapping would then be

acase.sel_where(idx) -> acase.sel(x=np.where(idx)[0], y=np.where(idx)[1])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203689249 https://github.com/pydata/xarray/issues/811#issuecomment-203689249 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY4OTI0OQ== pwolfram 4295853 2016-03-30T23:54:03Z 2016-03-30T23:54:03Z CONTRIBUTOR

Sorry about the poor word choice, what I really mean by "contraction" is "slice". The example below will hopefully demonstrate. I think sel_from or sel_where is probably better anyway. The idea is to provide boolean indexing a la http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html. In this case though, if a dimension is not used assume it is broadcast to preserve the dimension similar to orthogonal indexing. Essentially, I'm looking for a where mask that simply removes array entries instead of replacing them with nan. Thus, the method could also be called sel_where.

Pseudo code is below to demonstrate key feature of method

```

assume idx is boolean and

when True represents values that should be returned by a slice like behavior

let idx dims be ('x' and 'y')

$ idx.shape

(1000,1000) $ np.sum(idx.values,axis=0) 10 $ np.sum(idx.values,axis=1) 20 $ acase.shape (1000,1000) $ slicedcase = acase.sel_where(idx) $ slicedcase.shape (10, 20)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203676582 https://github.com/pydata/xarray/issues/811#issuecomment-203676582 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzY3NjU4Mg== pwolfram 4295853 2016-03-30T23:08:14Z 2016-03-30T23:46:02Z CONTRIBUTOR

Agreed. I think a sel_like method would work well to mirror the reindex/reindex_like notation.

To be clear for people listening for the distinction between this use case and that covered by reindex/reindex_like can be summarized as follows:

reindex_like does not do a contraction on whether the dataArray idx is True or False. There are many use cases where one will want to reduce the size of the data file (e.g., subseting a region from global climate model data). reindex_like preserves the shape of idx and not its collapsed shape based on a np.sum over each axis. A sel_like method will perform the contraction and the size that should be returned from idx along each axis should will be np.sum(idx,axis=axis) not just idx.shape[axis] as would occur for reindex_like.

@shoyer, are you in support of a PR that implements a sel_like to do a mapping of the form acase.sel_like(idx) -> acase.sel(x=idx.x.values, y=idx.y.values)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203639698 https://github.com/pydata/xarray/issues/811#issuecomment-203639698 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYzOTY5OA== pwolfram 4295853 2016-03-30T21:17:32Z 2016-03-30T21:17:32Z CONTRIBUTOR

I would argue that the intuitive behavior, even if it is a departure from numpy, is to make acase.sel(idx) be short hand for acase.sel(x=idx.x.values, y=idx.y.values) which would preserve the original dimensionality, but not necessarily the shape of acase because if a dimension is all False then that dimension should be dropped completely. This is because the sel method essentially operates on key-value pairs, or dictionary-like data.

I don't thing acase[idx] would make as much sense because this is closer to standard numpy notation where acase has two dimensions and it is possible that the dimensional arangement for acase and idx don't match (e.g., they could dimensionally be transposes of each other). If I understand correctly, arguments to the [] operator should essentially be data types, not a dictionary type. So this would be consistent because we can think of idx as a glorified dict with metadata whereas it is not a pure number type. Some type of dereferrencing operation is needed for the "dict" idx to become a data type. A carefully worded error should probably be returned here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203631503 https://github.com/pydata/xarray/issues/811#issuecomment-203631503 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYzMTUwMw== pwolfram 4295853 2016-03-30T20:49:52Z 2016-03-30T20:49:52Z CONTRIBUTOR

Thanks @shoyer. If idx has more than one dimension couldn't we just select along the dimensions that are shared between acase and idx and just broadcast for the others, returning all entries for dimensions in acase that are not in idx? I sense I'm potentially missing something subtle here.

The bottom line is that if we could do a non-1D selection that would allow for more concise syntax on the application side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203622523 https://github.com/pydata/xarray/issues/811#issuecomment-203622523 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzYyMjUyMw== pwolfram 4295853 2016-03-30T20:33:56Z 2016-03-30T20:33:56Z CONTRIBUTOR

Thanks for clarifying this @fmaussion! You are right that the shape of idx is (Nr: 1, Np: 92000) and that is why it fails.

However, the reason I originally filed this out is because there seems to be some clarification in syntax that could be done here. For example, should we be able to do something like acase.sel(idx) where the indices from idx are used to select acase? It seems like there may be a way to streamline this in a more intuitive way. If so, this is probably a new issue and this one should be closed because its original formulation, as you so helpfully point out, is flawed. Thoughts on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276
203571994 https://github.com/pydata/xarray/issues/811#issuecomment-203571994 https://api.github.com/repos/pydata/xarray/issues/811 MDEyOklzc3VlQ29tbWVudDIwMzU3MTk5NA== pwolfram 4295853 2016-03-30T18:39:12Z 2016-03-30T18:39:12Z CONTRIBUTOR

More detail on the genesis of this issue:

acase.sel(Np=np.where(idx)[0]) works but acase.sel(Np=idx) does not returning the following error:

``` In [73]: acase.sel(Np=idx)


ValueError Traceback (most recent call last) <ipython-input-73-abb4dc27ed02> in <module>() ----> 1 acase.sel(Np=idx)

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.pyc in sel(self, method, tolerance, indexers) 974 """ 975 return self.isel(indexing.remap_label_indexers( --> 976 self, indexers, method=method, tolerance=tolerance)) 977 978 def isel_points(self, dim='points', **indexers):

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in remap_label_indexers(data_obj, indexers, method, tolerance) 189 return dict((dim, convert_label_indexer(data_obj[dim].to_index(), label, 190 dim, method, tolerance)) --> 191 for dim, label in iteritems(indexers)) 192 193

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in <genexpr>((dim, label)) 189 return dict((dim, convert_label_indexer(data_obj[dim].to_index(), label, 190 dim, method, tolerance)) --> 191 for dim, label in iteritems(indexers)) 192 193

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in convert_label_indexer(index, label, index_name, method, tolerance) 168 'the index is unsorted or non-unique') 169 else: --> 170 label = _asarray_tuplesafe(label) 171 if label.ndim == 0: 172 indexer = index.get_loc(label.item(), **kwargs)

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/indexing.pyc in _asarray_tuplesafe(values) 131 if result.ndim == 2: 132 result = np.empty(len(values), dtype=object) --> 133 result[:] = values 134 135 return result

ValueError: could not broadcast input array from shape (92000) into shape (1)

```

with

``` In [76]: acase

Out[76]:

<xarray.Dataset> Dimensions: (Nb: 11, Np: 92000, Nr: 1, Nt-1: 27, Time: 28) Coordinates: yearoffset |S4 '1700' * Nb (Nb) float64 1.028e+03 1.029e+03 1.029e+03 1.029e+03 ... rlzn (Nr) int64 0 time (Time) datetime64[ns] 1724-01-01 1724-01-02 1724-01-03 ... * Np (Np) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... * Nr (Nr) int64 0 * Nt-1 (Nt-1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... * Time (Time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Data variables: lat (Nr, Time, Nb, Np) float64 8.66e+03 8.66e+03 8.66e+03 ... notoutcropped (Nr, Time, Nb, Np) int64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... lon (Nr, Time, Nb, Np) float64 5e+03 1e+04 1.5e+04 2e+04 ... dtdays (Nr, Nt-1) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... ```

``` In [75]: idx

Out[75]:

<xarray.DataArray (Nr: 1, Np: 92000)> array([[ True, True, True, ..., False, False, False]], dtype=bool) Coordinates: yearoffset |S4 '1700' Nb float64 1.029e+03 rlzn (Nr) int64 0 time datetime64[ns] 1724-01-01 * Np (Np) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... * Nr (Nr) int64 0 Time int64 0

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Selection based on boolean DataArray 144683276

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1444.026ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows