issue_comments
11 rows where issue = 357156174 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- DataArray.loc fails for duplicates where DataFrame works · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
672548285 | https://github.com/pydata/xarray/issues/2399#issuecomment-672548285 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDY3MjU0ODI4NQ== | stale[bot] 26384082 | 2020-08-12T03:24:32Z | 2020-08-12T03:24:32Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
420446624 | https://github.com/pydata/xarray/issues/2399#issuecomment-420446624 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQyMDQ0NjYyNA== | horta 514522 | 2018-09-11T22:24:14Z | 2018-09-11T22:24:14Z | CONTRIBUTOR | Yes, I'm working on that doc for now to come up a very precise and as simple as possible definitions. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
420444668 | https://github.com/pydata/xarray/issues/2399#issuecomment-420444668 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQyMDQ0NDY2OA== | fujiisoup 6815844 | 2018-09-11T22:16:32Z | 2018-09-11T22:16:32Z | MEMBER | Sorry that I couldn't join the discussion here. Thanks, @horta, for giving the nice document. We tried to use the consistent terminology in the docs, but I agree that it would be nice to have a list of the definitions. I think it might be better to discuss in another issue. See #2410. For
As xarray inherits not only from pandas but also from numpy's multi-dimensional array. That is, we need to be very consistent with the resultant shape of indexing. It would be confusing if a selection from different dimensional arrays becomes the same.
I also think that what is lacking in xarray is this functionality. Any interest to help us for this? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
420373780 | https://github.com/pydata/xarray/issues/2399#issuecomment-420373780 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQyMDM3Mzc4MA== | shoyer 1217238 | 2018-09-11T18:28:43Z | 2018-09-11T18:28:43Z | MEMBER | CC @fujiisoup who implemented much of this. I will also take a look at your doc when I have the chance. I do think that handling duplicate matches with indexing is an important use-case. This comes up with nearest neighbor matching as well -- it would be useful to be able to return the full set of matches within a given distance, not just the nearest match. I wonder if it would be more productive to consider a new indexing API for one -> many matches. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
420362244 | https://github.com/pydata/xarray/issues/2399#issuecomment-420362244 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQyMDM2MjI0NA== | horta 514522 | 2018-09-11T17:52:29Z | 2018-09-11T17:52:29Z | CONTRIBUTOR | Hi again. I'm working on a precise definition of xarray and indexing. I find the official one a bit hard to understand. It might help me come up with a reasonable way to handle duplicate indices. https://drive.google.com/file/d/1uJ_U6nedkNe916SMViuVKlkGwPX-mGK7/view?usp=sharing |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
419714631 | https://github.com/pydata/xarray/issues/2399#issuecomment-419714631 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxOTcxNDYzMQ== | horta 514522 | 2018-09-09T13:04:12Z | 2018-09-09T13:04:12Z | CONTRIBUTOR | I see. Now I read about it, let me give another shot. Let
and
The result of
as per column vector representation assumption. AnswerLaying down the first dimension gives | y | z | x | |---|---|---| | a | a | a | | | | a | By order, | y | z | x | dim_1 |---|---|---|-------| | a | a | a | ? | | | | a | ? | where | y | z | x | dim_1 |---|---|---|-------| | a | a | a | ? | | a | a | a | ? | And here is my suggestions. Use the mapping The answer is
for ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
419580420 | https://github.com/pydata/xarray/issues/2399#issuecomment-419580420 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxOTU4MDQyMA== | shoyer 1217238 | 2018-09-07T22:15:33Z | 2018-09-07T22:15:33Z | MEMBER | Please take a look at xarray's detailed indexing rules: http://xarray.pydata.org/en/stable/indexing.html#indexing-rules
I think this is the crux of the problem. Put another way: why should the result of indexing be a 1x2 array instead of a 2x1 array? Currently (with the exception of indexing by a scalar with an index with duplicates), xarray determines the shape/dimensions resulting from indexing from the shape/dimensions of the indexers not the array being indexed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
419383633 | https://github.com/pydata/xarray/issues/2399#issuecomment-419383633 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxOTM4MzYzMw== | horta 514522 | 2018-09-07T09:39:01Z | 2018-09-07T09:39:01Z | CONTRIBUTOR | Now I see the problem. But I think it is solvable. I will ignore the dimension names for now as I don't have much experience with xarray yet. The code
can be understood as defining two indexed arrays:
Algorithm:
Concretely, the solution is a bi-dimensional, 1x2 array: | 0 1 |. There is another relevant example. Let the code be
We have Algorithm:
The solution is a bi-dimensional, 1x3 array: | 0 1 2 | Explanation
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
419173479 | https://github.com/pydata/xarray/issues/2399#issuecomment-419173479 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxOTE3MzQ3OQ== | shoyer 1217238 | 2018-09-06T17:18:08Z | 2018-09-06T17:18:08Z | MEMBER | Let me give a more concrete example of the issue for multi-dimensional indexing:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
419166914 | https://github.com/pydata/xarray/issues/2399#issuecomment-419166914 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxOTE2NjkxNA== | horta 514522 | 2018-09-06T16:56:44Z | 2018-09-06T16:56:44Z | CONTRIBUTOR | Thanks for the feedback!
```python import pandas as pd df = pd.DataFrame(data=[0, 1, 2], index=list("aab")) print(df.loc[list("ab")]) 0a 0a 1b 2``` is an INNER JOIN between the two indexes
Another example: ```python import pandas as pd df = pd.DataFrame(data=[0, 1], index=list("aa")) print(df.loc[list("aa")]) 0a 0a 1a 0a 1``` is again an INNER JOIN between the two indexes
This translate into an unidimensional index:
Converting it back to the matricial representation:
In summary, my suggestion is to consider the possibility of defining indexing The multi-dimensional indexing, as far as I can see, can always be transformed into the uni-dimensional case and treated as such. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 | |
418778596 | https://github.com/pydata/xarray/issues/2399#issuecomment-418778596 | https://api.github.com/repos/pydata/xarray/issues/2399 | MDEyOklzc3VlQ29tbWVudDQxODc3ODU5Ng== | shoyer 1217238 | 2018-09-05T15:41:21Z | 2018-09-05T15:41:21Z | MEMBER | Thanks for the report! This was actually a somewhat intentional omission in xarray, but if would not be particularly difficult to add in this feature if we want it. At the very least, we should note this deviation somewhere in the docs. There are two potentially problematic aspects to the pandas behavior:
1. It means that you cannot count on indexing a dataframe with its own index to return something equivalent to the original dataframe, e.g., consider Now that you bring this up, I wonder how the existing supporting for indexing like |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.loc fails for duplicates where DataFrame works 357156174 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4