home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER" and issue = 357156174 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 4
  • fujiisoup 1

issue 1

  • DataArray.loc fails for duplicates where DataFrame works · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
420444668 https://github.com/pydata/xarray/issues/2399#issuecomment-420444668 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQyMDQ0NDY2OA== fujiisoup 6815844 2018-09-11T22:16:32Z 2018-09-11T22:16:32Z MEMBER

Sorry that I couldn't join the discussion here.

Thanks, @horta, for giving the nice document. We tried to use the consistent terminology in the docs, but I agree that it would be nice to have a list of the definitions. I think it might be better to discuss in another issue. See #2410.

For loc and sel issues. One thing I don't agree is

The result of d.loc[i] is equal to d.sel(x=i). Also, it seems reasonable to expect the its result should be the same as d0.sel(x=i) for d0 given by

As xarray inherits not only from pandas but also from numpy's multi-dimensional array. That is, we need to be very consistent with the resultant shape of indexing. It would be confusing if a selection from different dimensional arrays becomes the same.

I do think that handling duplicate matches with indexing is an important use-case. This comes up with nearest neighbor matching as well -- it would be useful to be able to return the full set of matches within a given distance, not just the nearest match.

I also think that what is lacking in xarray is this functionality. Any interest to help us for this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
420373780 https://github.com/pydata/xarray/issues/2399#issuecomment-420373780 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQyMDM3Mzc4MA== shoyer 1217238 2018-09-11T18:28:43Z 2018-09-11T18:28:43Z MEMBER

CC @fujiisoup who implemented much of this. I will also take a look at your doc when I have the chance.

I do think that handling duplicate matches with indexing is an important use-case. This comes up with nearest neighbor matching as well -- it would be useful to be able to return the full set of matches within a given distance, not just the nearest match.

I wonder if it would be more productive to consider a new indexing API for one -> many matches. sel/loc is already quite complex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
419580420 https://github.com/pydata/xarray/issues/2399#issuecomment-419580420 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxOTU4MDQyMA== shoyer 1217238 2018-09-07T22:15:33Z 2018-09-07T22:15:33Z MEMBER

Please take a look at xarray's detailed indexing rules: http://xarray.pydata.org/en/stable/indexing.html#indexing-rules

I will ignore the dimension names for now as I don't have much experience with xarray yet.

I think this is the crux of the problem. Put another way: why should the result of indexing be a 1x2 array instead of a 2x1 array? Currently (with the exception of indexing by a scalar with an index with duplicates), xarray determines the shape/dimensions resulting from indexing from the shape/dimensions of the indexers not the array being indexed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
419173479 https://github.com/pydata/xarray/issues/2399#issuecomment-419173479 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxOTE3MzQ3OQ== shoyer 1217238 2018-09-06T17:18:08Z 2018-09-06T17:18:08Z MEMBER

Let me give a more concrete example of the issue for multi-dimensional indexing: python da_unique = xr.DataArray([0, 1], dims=['x'], coords={'x': ['a', 'b']}) da_nonunique = xr.DataArray([0, 1], dims=['x'], coords={'x': ['a', 'a']}) indexer = xr.DataArray([['a']], dims=['y', 'z']) With a unique index, notice how the result takes on the dimensions of the indexer: ```

da_unique.loc[indexer] <xarray.DataArray (y: 1, z: 1)> array([[0]]) Coordinates: x (y, z) object 'a' Dimensions without coordinates: y, z `` What would you propose for the result ofda_nonunique.loc[indexer]`?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
418778596 https://github.com/pydata/xarray/issues/2399#issuecomment-418778596 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxODc3ODU5Ng== shoyer 1217238 2018-09-05T15:41:21Z 2018-09-05T15:41:21Z MEMBER

Thanks for the report!

This was actually a somewhat intentional omission in xarray, but if would not be particularly difficult to add in this feature if we want it. At the very least, we should note this deviation somewhere in the docs.

There are two potentially problematic aspects to the pandas behavior: 1. It means that you cannot count on indexing a dataframe with its own index to return something equivalent to the original dataframe, e.g., consider df.loc[['a', 'a']] in your example, which returns a dataframe with 4 rows. 2. More generally, it means you can't count on indexing a dataframe with an array to return an object of the same size as the indexer. This is particularly problematic for xarray, because we support vectorized indexing with multi-dimensional indexers. I don't know how we could define a multi-dimensional equivalent of this -- what shape should the result have if you indexed with a multi-dimensional array instead, e.g., da.loc[{"dim_0": xr.DataArray([['a']]}]? With multiple dimensions involved, it's not clear where the extra introduced dimensions should go.

Now that you bring this up, I wonder how the existing supporting for indexing like da.loc[{"dim_0": "a"}] would work if there are other multi-dimensional indexers. I don't know if we have test coverage for this...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.232ms · About: xarray-datasette