home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 357156174 and user = 514522 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • horta · 5 ✖

issue 1

  • DataArray.loc fails for duplicates where DataFrame works · 5 ✖

author_association 1

  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
420446624 https://github.com/pydata/xarray/issues/2399#issuecomment-420446624 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQyMDQ0NjYyNA== horta 514522 2018-09-11T22:24:14Z 2018-09-11T22:24:14Z CONTRIBUTOR

Yes, I'm working on that doc for now to come up a very precise and as simple as possible definitions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
420362244 https://github.com/pydata/xarray/issues/2399#issuecomment-420362244 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQyMDM2MjI0NA== horta 514522 2018-09-11T17:52:29Z 2018-09-11T17:52:29Z CONTRIBUTOR

Hi again. I'm working on a precise definition of xarray and indexing. I find the official one a bit hard to understand. It might help me come up with a reasonable way to handle duplicate indices. https://drive.google.com/file/d/1uJ_U6nedkNe916SMViuVKlkGwPX-mGK7/view?usp=sharing

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
419714631 https://github.com/pydata/xarray/issues/2399#issuecomment-419714631 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxOTcxNDYzMQ== horta 514522 2018-09-09T13:04:12Z 2018-09-09T13:04:12Z CONTRIBUTOR

I see. Now I read about it, let me give another shot.

Let i be

<xarray.DataArray (y: 1, z: 1)> array([['a']], dtype='<U1') Dimensions without coordinates: y, z

and d be

<xarray.DataArray (x: 2)> array([0, 1]) Coordinates: * x (x) <U1 'a' 'a'

The result of d.loc[i] is equal to d.sel(x=i). Also, it seems reasonable to expect the its result should be the same as d0.sel(x=i) for d0 given by

<xarray.DataArray (x: 2, dim_1: 1)> array([[0], [1]]) Coordinates: * x (x) <U1 'a' 'a' Dimensions without coordinates: dim_1

as per column vector representation assumption.

Answer

Laying down the first dimension gives

| y | z | x | |---|---|---| | a | a | a | | | | a |

By order, x will match with y and therefore we will append a new dimension after x to match with z:

| y | z | x | dim_1 |---|---|---|-------| | a | a | a | ? | | | | a | ? |

where ? means any. Joining the first and second halves of the table gives

| y | z | x | dim_1 |---|---|---|-------| | a | a | a | ? | | a | a | a | ? |

And here is my suggestions. Use the mapping y|->x and z|->dim_1 to decide which axis to expand for the additional element. I will choose y-axis because the additional a was originally appended to the x-axis.

The answer is

<xarray.DataArray (y: 2, z: 1)> array([[0], [1]]) Coordinates: x (y, z) <U1 'a' 'a' Dimensions without coordinates: y, z

for

```

ans.coords["x"] <xarray.DataArray 'x' (y: 2, z: 1)> array([['a'], ['a']], dtype='<U1') Coordinates: x (y, z) <U1 'a' 'a' Dimensions without coordinates: y, z ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
419383633 https://github.com/pydata/xarray/issues/2399#issuecomment-419383633 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxOTM4MzYzMw== horta 514522 2018-09-07T09:39:01Z 2018-09-07T09:39:01Z CONTRIBUTOR

Now I see the problem. But I think it is solvable.

I will ignore the dimension names for now as I don't have much experience with xarray yet.

The code

python da_nonunique = xr.DataArray([0, 1], dims=['x'], coords={'x': ['a', 'a']} indexer = xr.DataArray([['a']], dims=['y', 'z'])

can be understood as defining two indexed arrays:

[a, a] and [[a]]. As we are allowing for non-unique indexing, I will denote unique array elements as [e_0, e_1] and [[r_0]] interchangeably.

Algorithm:

  1. Align. [[a], [a]] and [[a]].
  2. Ravel. [(a,a), (a,a)] and [(a,a)].
  3. Join. [(a,a), (a,a)]. I.e., [e_0, e_1].
  4. Unravel. [[e_0, e_1]]. Notice that [e_0, e_1] has been picked up by r_0.
  5. Reshape. [[e_0, e_1]] (solution).

Concretely, the solution is a bi-dimensional, 1x2 array:

| 0 1 |.

There is another relevant example. Let the code be

python da_nonunique = xr.DataArray([0, 1, 2], dims=['x'], coords={'x': ['a', 'a', 'b']} indexer = xr.DataArray([['a', 'b']], dims=['y', 'z'])

We have [a, a, b] and [[a, b]], also denoted as [e_0, e_1, e_2] and [[r_0, r_1]].

Algorithm:

  1. Align. [[a], [a], [b]] and [[a, b]].
  2. Ravel. [(a,a), (a,a), (b,b)] and [(a,a), (b,b)].
  3. Join. [(a,a), (a,a), (b,b)]. I.e., [e_0, e_1, e_2].
  4. Unravel. [[e_0, e_1, e_2]]. Notice now that [e_0, e_1] has been picked up by r_0 and [e_2] by r_1.
  5. Reshape. [[e_0, e_1, e_2]].

The solution is a bi-dimensional, 1x3 array:

| 0 1 2 |

Explanation

  1. Align recursively adds a new dimension in the array with lower dimensionality.
  2. Ravel recursively removes a dimension by converting elements into tuples.
  3. SQL Join operation: Cartesian product plus match.
  4. Unravel performs the inverse of 2.
  5. Reshape converts it to the indexer's dimensionality.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174
419166914 https://github.com/pydata/xarray/issues/2399#issuecomment-419166914 https://api.github.com/repos/pydata/xarray/issues/2399 MDEyOklzc3VlQ29tbWVudDQxOTE2NjkxNA== horta 514522 2018-09-06T16:56:44Z 2018-09-06T16:56:44Z CONTRIBUTOR

Thanks for the feedback!

  1. You can count on indexing if the is_unique flag is checked beforehand. The way pandas does indexing seems to be both clear to the user and powerful. It seems clear because indexing is the result of a Cartesian product after filtering for matching values. It is powerful because it allows indexing as complex as SQL INNER JOIN, which covers the trivial case of unique elements. For example, the following operation

```python import pandas as pd

df = pd.DataFrame(data=[0, 1, 2], index=list("aab")) print(df.loc[list("ab")])

0

a 0

a 1

b 2

```

is an INNER JOIN between the two indexes

INNER((a, b) x (a, a, b)) = INNER(aa, aa, ab, ba, ba, bb) = (aa, aa, bb)

Another example:

```python import pandas as pd

df = pd.DataFrame(data=[0, 1], index=list("aa")) print(df.loc[list("aa")])

0

a 0

a 1

a 0

a 1

```

is again an INNER JOIN between the two indexes

INNER((a, a) x (a, a)) = INNER(aa, aa, aa, aa) = (aa, aa, aa, aa)

  1. Assume a bidimensional array with the following indexing:

0 1 a ! @ a # $

This translate into an unidimensional index: (a, 0), (a, 1), (a, 0), (a, 1). As such, it can be treated as usual. Assume you index the above matrix using [('a', 0), ('a', 0)]. This implies

INNER( ((a, 0), (a, 0)) x ((a, 0), (a, 1), (a, 0), (a, 1)) ) = INNER( (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1) ) = ((a,0)(a,0), (a,0)(a,0), (a,0)(a,0), (a,0)(a,0))

Converting it back to the matricial representation:

0 0 a ! ! a # #

In summary, my suggestion is to consider the possibility of defining indexing B by using A (i.e., B.loc(A)) as a Cartesian product followed by match filtering. Or in SQL terms, an INNER JOIN.

The multi-dimensional indexing, as far as I can see, can always be transformed into the uni-dimensional case and treated as such.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.loc fails for duplicates where DataFrame works 357156174

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.347ms · About: xarray-datasette