home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where author_association = "MEMBER" and issue = 98274024 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • clarkfitzg 7
  • shoyer 5

issue 1

  • ENH: where method for masking xray objects according to some criteria · 12 ✖

author_association 1

  • MEMBER · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
126922669 https://github.com/pydata/xarray/pull/504#issuecomment-126922669 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjkyMjY2OQ== clarkfitzg 5356122 2015-08-01T14:44:50Z 2015-08-01T14:44:50Z MEMBER

Looks good. Merge?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126851008 https://github.com/pydata/xarray/pull/504#issuecomment-126851008 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjg1MTAwOA== shoyer 1217238 2015-08-01T02:22:59Z 2015-08-01T02:22:59Z MEMBER

I moved the docs around and added a note on multi-dimensional indexing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126824906 https://github.com/pydata/xarray/pull/504#issuecomment-126824906 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjgyNDkwNg== clarkfitzg 5356122 2015-07-31T22:11:03Z 2015-07-31T22:11:03Z MEMBER

I was thinking about only allowing it to work only if the array has exactly matching coordinates. Which would be the case in (4) a[(x > 0) & (y > 0)]. But then it would be difficult to stay consistent in the 1d case- mask or select?

My sense is that we'll probably be happier if we have entirely distinct APIs for masking (.where) and selection ([] and .loc[]).

That's a concrete and easy to understand distinction. I'm convinced.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126815158 https://github.com/pydata/xarray/pull/504#issuecomment-126815158 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjgxNTE1OA== shoyer 1217238 2015-07-31T21:22:36Z 2015-07-31T21:22:36Z MEMBER

Oh, wow -- I didn't even realize that worked in pandas! Combined with NA-skipping aggregation functions in pandas that makes expressions like a[a < 0].mean() work just like the same expression in NumPy.

So instead of adding where, perhaps we should just support boolean indexing like pandas.

The main difference is that where can cleanly support broadcasting, whereas we currently don't do broadcasting in indexing. For example, suppose a is a 2-dimensional DataArray with dimensions (x, y). Now considering the following cases: 1. a[x > 0] 2. a[y > 0] 3. a[x > 0, y > 0] 4. a[(x > 0) & (y > 0)]

Currently, (1) and (3) work by selection. If we adopt the pandas behavior, (4) would also work, but by broadcasting and masking. This seems like a potential recipe for confusion, because once you have (4), case (2) seems like a natural variation. We could implement (2), but should it mask or select?

My sense is that we'll probably be happier if we have entirely distinct APIs for masking (.where) and selection ([] and .loc[]).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126810402 https://github.com/pydata/xarray/pull/504#issuecomment-126810402 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjgxMDQwMg== clarkfitzg 5356122 2015-07-31T20:56:31Z 2015-07-31T20:56:31Z MEMBER

Right. Consider following pandas rather than numpy here:

``` In [9]: a = pd.DataFrame(np.random.randn(3, 4))

In [10]: a Out[10]: 0 1 2 3 0 -1.188669 0.055286 -0.476962 0.144261 1 1.779646 2.332629 0.326515 -0.179862 2 -0.016739 1.221892 -0.032720 -0.779563

In [11]: a[a < 0] Out[11]: 0 1 2 3 0 -1.188669 NaN -0.476962 NaN 1 NaN NaN NaN -0.179862 2 -0.016739 NaN -0.032720 -0.779563 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126784623 https://github.com/pydata/xarray/pull/504#issuecomment-126784623 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc4NDYyMw== shoyer 1217238 2015-07-31T18:58:16Z 2015-07-31T18:58:16Z MEMBER

Both R and pandas allow the user to do a[a < 0] and a[a < 0] = 0. So what I'm wondering is why not extend xray's indexing to also work on arrays that are the same shape and have the same labels as the original array?

The problem is that a[a < 0] in NumPy flattens arrays with more than one dimension. We can't do this in xray without doing something like pointwise indexing to flatten out the labels, too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126779995 https://github.com/pydata/xarray/pull/504#issuecomment-126779995 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc3OTk5NQ== clarkfitzg 5356122 2015-07-31T18:41:29Z 2015-07-31T18:41:29Z MEMBER

Agreed- if a[a < 0] = 0 works then a[a < 0] should work also.

Both R and pandas allow the user to do a[a < 0] and a[a < 0] = 0. So what I'm wondering is why not extend xray's indexing to also work on arrays that are the same shape and have the same labels as the original array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126772960 https://github.com/pydata/xarray/pull/504#issuecomment-126772960 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc3Mjk2MA== shoyer 1217238 2015-07-31T18:05:22Z 2015-07-31T18:10:59Z MEMBER

Right now, you can do that by chaining two operations: a.where(a > 0).fillna(0). It would definitely be better to support it in one (a.where(a > 0, 0)).

I suppose we could also support a[a < 0] = 0, but it seems a little strange given that we don't support a[a < 0].

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126771341 https://github.com/pydata/xarray/pull/504#issuecomment-126771341 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc3MTM0MQ== clarkfitzg 5356122 2015-07-31T17:57:01Z 2015-07-31T17:57:01Z MEMBER

Here's something related that one can do in Numpy- replace all negative entries with 0.

``` In [15]: a = np.arange(-5, 5).reshape(2, 5)

In [16]: a Out[16]: array([[-5, -4, -3, -2, -1], [ 0, 1, 2, 3, 4]])

In [17]: a[a < 0] = 0

In [18]: a Out[18]: array([[0, 0, 0, 0, 0], [0, 1, 2, 3, 4]]) ```

Would it be possible to modify __getitem__ for this common use case?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126770381 https://github.com/pydata/xarray/pull/504#issuecomment-126770381 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc3MDM4MQ== shoyer 1217238 2015-07-31T17:51:49Z 2015-07-31T17:51:49Z MEMBER

@clarkfitzg Yes, that's mostly right. The main differences: 1. The order of the arguments here is different, to match the pandas methods (which has more of a SQL flavor to it). 2. I'm not exposing the third argument, because xray objects don't yet implement broadcasting operations with more than 2 arguments at once. This is something that needs refactoring -- the logic in _binary_op should be generalized to any number of arguments.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126768529 https://github.com/pydata/xarray/pull/504#issuecomment-126768529 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc2ODUyOQ== clarkfitzg 5356122 2015-07-31T17:45:32Z 2015-07-31T17:45:32Z MEMBER

Checking if I understand- this exists in line 79 of ops.py

where = _dask_or_eager_func('where')

so this PR is to expose it in the users API?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024
126756775 https://github.com/pydata/xarray/pull/504#issuecomment-126756775 https://api.github.com/repos/pydata/xarray/issues/504 MDEyOklzc3VlQ29tbWVudDEyNjc1Njc3NQ== clarkfitzg 5356122 2015-07-31T17:19:56Z 2015-07-31T17:19:56Z MEMBER

Plot makes for a compelling example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: where method for masking xray objects according to some criteria 98274024

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 882.595ms · About: xarray-datasette