html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2560#issuecomment-445482824,https://api.github.com/repos/pydata/xarray/issues/2560,445482824,MDEyOklzc3VlQ29tbWVudDQ0NTQ4MjgyNA==,514522,2018-12-08T19:13:08Z,2018-12-08T19:13:08Z,CONTRIBUTOR,"Sorry guys. I've found the problem and solution.

The problem is that filesystem not supporting lock mechanism. The solution is to export the following variable: `export HDF5_USE_FILE_LOCKING=FALSE`.","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,383057458
https://github.com/pydata/xarray/issues/2560#issuecomment-440909054,https://api.github.com/repos/pydata/xarray/issues/2560,440909054,MDEyOklzc3VlQ29tbWVudDQ0MDkwOTA1NA==,514522,2018-11-22T04:25:05Z,2018-11-22T04:25:05Z,CONTRIBUTOR,https://github.com/limix/limix/blob/2.0.0/limix/qtl/test/test_qtl_xarr.py,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,383057458
https://github.com/pydata/xarray/issues/2410#issuecomment-422368970,https://api.github.com/repos/pydata/xarray/issues/2410,422368970,MDEyOklzc3VlQ29tbWVudDQyMjM2ODk3MA==,514522,2018-09-18T12:17:06Z,2018-09-18T12:17:06Z,CONTRIBUTOR,"I will first try to have both together. I'm well aware that learning by examples (that is true for me at least and apparently to most of people: tldr library), so at first I will try to combine all in one page:

1. Starts with examples, going from simple ones to more complicated one with no definition whasoever. 
2. Begins a section defining terms and giving examples that ellucidate them (the first section we have here)
3. Ends with a formal description of the algorithm (the second section we have here)

I prefer starting with 2 and 3 for me to actually understand xarray...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,359240638
https://github.com/pydata/xarray/issues/2410#issuecomment-421998857,https://api.github.com/repos/pydata/xarray/issues/2410,421998857,MDEyOklzc3VlQ29tbWVudDQyMTk5ODg1Nw==,514522,2018-09-17T12:40:52Z,2018-09-17T12:40:52Z,CONTRIBUTOR,"I have updated mainly the _Indexing and selection data_ section. I'm proposing an indexing notation using `[]` operator vs `()` function call to differentiate between dimension lookup. But more importantly, I'm working out a precise definition of data array indexing in section _Formal indexing definition_.

# Xarray definition

A **data array** `a` has `D` dimensions, ordered from `0` to `D`. It contains an array of dimensionality `D`. The first dimension of that array is associated with the first dimension of the data array, and so forth. That array is returned by the data array attribute `values` . A **named data array** is a data array with the `name` attribute of string value:
```python
>>> import xarray as xr
>>>
>>> a = xr.DataArray([[0, 1], [2, 3], [4, 5]])
>>> a.name = ""My name""
>>> a
<xarray.DataArray 'My name' (dim_0: 3, dim_1: 2)>
array([[0, 1],
       [2, 3],
       [4, 5]])
Dimensions without coordinates: dim_0, dim_1
```

Each data array dimension has an unique `name` attribute of string type and can be accessed via data array `dims` attribute of tuple type. The name of the dimension `i` is `a.dims[i]` :
```python
>>> a.dims[0]
'dim_0'
```

A data array can have zero or more coordinates, represented by a dict-like `coords` attribute. A coordinate is a named data array, referred also as **coordinate data array**. Coordinate data arrays have unique names among other coordinate data arrays. A coordinate data array of name `x` can be retrieved by `a.coords[x]` .

A coordinate can have zero or more dimensions associated with. A **dimension data array** is a unidimensional coordinate data array associated with one, and only one, dimension having the same name as the coordinate data array itself. A dimension data array has always one, and only one, coordinate. That coordinate has again a dimension data array associated with:
```python
>>> import numpy as np
>>>
>>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=[""x"", ""y""], coords={""x"": list(""abc"")})
>>> a
<xarray.DataArray (x: 3, y: 2)>
array([[0, 1],
       [2, 3],
       [4, 5]])
Coordinates:
  * x        (x) <U1 'a' 'b' 'c'
Dimensions without coordinates: y
```

The above data array `a` has two dimensions: `""x""` and `""y""`. It has a single coordinate `""x""`, with its associated dimension data array `a.coords[""x""]`. The dimension data array definition implies in the following recursion:
```python
>>> a.coords[""x""]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
  * x        (x) <U1 'a' 'b' 'c'
>>> a.coords[""x""].coords[""x""]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
  * x        (x) <U1 'a' 'b' 'c'
```

Coordinate data arrays are meant to provide labels to array positions, allowing for convenient access to array elements:
```python
>>> a.loc[""b"", :]
<xarray.DataArray (y: 2)>
array([2, 3])
Coordinates:
    x        <U1 'b'
Dimensions without coordinates: y
```

Note that there is no asterisk symbol for coordinate `""x""` of the above resulting data array, as the coordinate is not associated with any dimension. In other words,  the coordinate data array `a.loc[""b"", :].coords[""x""]` is not a dimension data array.

## Indexing and selecting data
There are four different but equally powerful ways of selecting data from a data array. They differ only on the type of dimension and index lookups: **position-based lookup** and **label-based lookup**:
```
| Dimension lookup | Index lookup   | Data array         |
|------------------|----------------|--------------------|
| Position-based   | Position-based | a[:, 0]            |
| Position-based   | Label-based    | a.loc[:, ""UK""]     |
| Label-based      | Position-based | a(country=0)       |
| Label-based      | Label-based    | a.loc(country=""UK"")|
```

A **dimension position-based lookup** is determined by the used position in the index operator: `a[first_dim, second_dim, ...]` and `a.loc[first_dim, second_dim, ...]`. An **index position-based lookup** is determined by the provided integers or slices: `a[0, [3, 9], :, ...]` and `a.loc(country=0, time=[3, 9], space=slice(None))`.

A **dimension label-based lookup** is determined by the provided dimension name: `a(country=0)` and `a.loc(country=""UK"")`. An **index label-based loookup** is determined by the provided _index labels_ or slices [1]: `a.loc[:, ""UK""]` and `a.loc(countr=""UK"")`.

[1] An **index label** is any [Numpy data type](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html) object.

Consider the following data array:
```python
>>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=[""year"", ""country""], coords={""year"": [1990, 1994, 1998], ""country"": [""UK"", ""US""]})
>>> a
<xarray.DataArray (year: 3, country: 2)>
array([[0, 1],
       [2, 3],
       [4, 5]])
Coordinates:
  * year     (year) int64 1990 1994 1998
  * country  (country) <U2 'UK' 'US'
```

The expressions `a[:, 0]`, `a.loc[:, ""UK""]`, `a(country=0)`, and `a.loc(country=""UK"")` will all produce the same result:
```python
>>> a.loc[:, ""UK""]
<xarray.DataArray (year: 3)>
array([0, 2, 4])
Coordinates:
  * year     (year) int64 1990 1994 1998
    country  <U2 'UK'
```

### Formal indexing definition
Let `A` be the dimensionality of the `a`, the data array being indexed. Let `b` be the resulting data array. The Python operations for indexing is formally translated into an  `A`-tuple:
```
(i_1, i_2, ..., i_A)
```
for which `i_j` is a named data array whose values are index labels. This data array, as usual, can have `0`, `1`, or more dimensions. Its construction is described later in this section.

Let
```
(r_1, r_2, ..., r_A)
```
be the tuple representing the indices of `a`. Precisely, temporarily create dimension data arrays with labels from `0` to the dimension size for those dimensions without an associated dimension data array. Therefore, `r_j` is a dimension data array for dimension `j`. Also, it is required that `i_j` values are a subset of `r_j` values.

For each `j`, define the lists `I_j = [(i_j0, 0), (i_j1, 1), ...]` and `R_j = [(r_j0, 0), (r_j1, 1), ...]` with pairs of data array values and positions. Perform a SQL JOIN as follows:

1. Apply the Cartesian product between the values of `I_j`  and the values of `R_j`.
2. Preserve only those tuples that have equal values for the first dimension.

Consider `i_0` and `r_0` defined as follows:
```python
>>> i_0
<xarray.DataArray (apples: 2, oranges: 1)>
array([['a'],
       ['c']], dtype='<U1')
Dimensions without coordinates: apples, oranges
>>> r_0
<xarray.DataArray (dim_0: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
  * dim_0    (dim_0) <U1 'a' 'b' 'c'
```
Performing operations 1 and 2 will result in the list `IR_j = [(('a', 0), ('a', 0)), (('c', 1), ('c', 3))]`. 

The positions `IR_j[0][1][1]`,  `IR_j[1][1][1]`, and so forth are used to access the values of `a` and assign them to the positions `IR_j[0][0][1]`,  `IR_j[1][0][1]`, and so forth of `b`. Precisely, for each `(t_0, t_1, ..., t_A)` in `itertools.product(IR_0, IR_1, ..., IR_A)`, assign
```python
b[reshape(t_0[0][1], i_0.shape), ..., reshape(t_A[0][1], i_A.shape)] = a[t_0[1][1], ..., t_A[1][1]]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,359240638
https://github.com/pydata/xarray/issues/2410#issuecomment-420446944,https://api.github.com/repos/pydata/xarray/issues/2410,420446944,MDEyOklzc3VlQ29tbWVudDQyMDQ0Njk0NA==,514522,2018-09-11T22:25:23Z,2018-09-11T22:25:23Z,CONTRIBUTOR,"Thanks guys! Just to make sure, this is a work in progress. i realise that I made some wrong assumptions, and there are more to add into it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,359240638
https://github.com/pydata/xarray/issues/2399#issuecomment-420446624,https://api.github.com/repos/pydata/xarray/issues/2399,420446624,MDEyOklzc3VlQ29tbWVudDQyMDQ0NjYyNA==,514522,2018-09-11T22:24:14Z,2018-09-11T22:24:14Z,CONTRIBUTOR,"Yes, I'm working on that doc for now to come up a very precise and as simple as possible definitions.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357156174
https://github.com/pydata/xarray/issues/2399#issuecomment-420362244,https://api.github.com/repos/pydata/xarray/issues/2399,420362244,MDEyOklzc3VlQ29tbWVudDQyMDM2MjI0NA==,514522,2018-09-11T17:52:29Z,2018-09-11T17:52:29Z,CONTRIBUTOR,Hi again. I'm working on a precise definition of xarray and indexing. I find the official one a bit hard to understand. It might help me come up with a reasonable way to handle duplicate indices. https://drive.google.com/file/d/1uJ_U6nedkNe916SMViuVKlkGwPX-mGK7/view?usp=sharing,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357156174
https://github.com/pydata/xarray/issues/2399#issuecomment-419714631,https://api.github.com/repos/pydata/xarray/issues/2399,419714631,MDEyOklzc3VlQ29tbWVudDQxOTcxNDYzMQ==,514522,2018-09-09T13:04:12Z,2018-09-09T13:04:12Z,CONTRIBUTOR,"I see. Now I read about it, let me give another shot.

Let `i` be

```
<xarray.DataArray (y: 1, z: 1)>
array([['a']], dtype='<U1')
Dimensions without coordinates: y, z
```

and `d` be

```
<xarray.DataArray (x: 2)>
array([0, 1])
Coordinates:
  * x        (x) <U1 'a' 'a'
```

The result of `d.loc[i]` is equal to `d.sel(x=i)`. Also, it seems reasonable to expect the its result should be the same as `d0.sel(x=i)` for `d0` given by

```
<xarray.DataArray (x: 2, dim_1: 1)>
array([[0],
       [1]])
Coordinates:
  * x        (x) <U1 'a' 'a'
Dimensions without coordinates: dim_1
```

as per column vector representation assumption.

## Answer

Laying down the first dimension gives

| y | z | x |
|---|---|---|
| a | a | a |
|   |   | a |

By order, `x` will match with `y` and therefore we will append a new dimension after `x` to
match with `z`:

| y | z | x | dim_1
|---|---|---|-------|
| a | a | a | ?     |
|   |   | a | ?     |

where `?` means any. Joining the first and second halves of the table gives

| y | z | x | dim_1
|---|---|---|-------|
| a | a | a | ?     |
| a | a | a | ?     |

And here is my suggestions. Use the mapping `y|->x` and `z|->dim_1` to decide which
axis to expand for the additional element. I will choose y-axis because the additional `a` was
originally appended to the x-axis.

The answer is

```
<xarray.DataArray (y: 2, z: 1)>
array([[0],
       [1]])
Coordinates:
    x        (y, z) <U1 'a' 'a'
Dimensions without coordinates: y, z
```

for

```
>>> ans.coords[""x""]
<xarray.DataArray 'x' (y: 2, z: 1)>
array([['a'],
       ['a']], dtype='<U1')
Coordinates:
    x        (y, z) <U1 'a' 'a'
Dimensions without coordinates: y, z
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357156174
https://github.com/pydata/xarray/issues/2399#issuecomment-419383633,https://api.github.com/repos/pydata/xarray/issues/2399,419383633,MDEyOklzc3VlQ29tbWVudDQxOTM4MzYzMw==,514522,2018-09-07T09:39:01Z,2018-09-07T09:39:01Z,CONTRIBUTOR,"Now I see the problem. But I think it is solvable.

I will ignore the dimension names for now as I don't have
much experience with xarray yet.

The code

```python
da_nonunique = xr.DataArray([0, 1], dims=['x'], coords={'x': ['a', 'a']}
indexer = xr.DataArray([['a']], dims=['y', 'z'])
```

can be understood as defining two indexed arrays:

`[a, a]` and `[[a]]`. As we are allowing for non-unique indexing,
I will denote unique array elements as `[e_0, e_1]` and `[[r_0]]`
interchangeably.

Algorithm:

1. Align. `[[a], [a]]` and `[[a]]`.
2. Ravel. `[(a,a), (a,a)]` and `[(a,a)]`.
3. Join. `[(a,a), (a,a)]`. I.e., `[e_0, e_1]`.
4. Unravel. `[[e_0, e_1]]`. Notice that `[e_0, e_1]` has been
picked up by `r_0`.
5. Reshape. `[[e_0, e_1]]` (solution).

Concretely, the solution is a bi-dimensional, 1x2 array:

| 0 1 |.

There is another relevant example. Let the code be

```python
da_nonunique = xr.DataArray([0, 1, 2], dims=['x'], coords={'x': ['a', 'a', 'b']}
indexer = xr.DataArray([['a', 'b']], dims=['y', 'z'])
```

We have `[a, a, b]` and `[[a, b]]`, also denoted as `[e_0, e_1, e_2]`
and `[[r_0, r_1]]`.

Algorithm:

1. Align. `[[a], [a], [b]]` and `[[a, b]]`.
2. Ravel. `[(a,a), (a,a), (b,b)]` and `[(a,a), (b,b)]`.
3. Join. `[(a,a), (a,a), (b,b)]`. I.e., `[e_0, e_1, e_2]`.
4. Unravel. `[[e_0, e_1, e_2]]`. Notice now that `[e_0, e_1]` has been
picked up by `r_0` and `[e_2]` by `r_1`.
5. Reshape. `[[e_0, e_1, e_2]]`.

The solution is a bi-dimensional, 1x3 array:

| 0 1 2 |


Explanation
-----------

1. Align recursively adds a new dimension in the array with lower dimensionality.
2. Ravel recursively removes a dimension by converting elements into tuples.
3. SQL Join operation: Cartesian product plus match.
4. Unravel performs the inverse of 2.
5. Reshape converts it to the indexer's dimensionality.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357156174
https://github.com/pydata/xarray/issues/2399#issuecomment-419166914,https://api.github.com/repos/pydata/xarray/issues/2399,419166914,MDEyOklzc3VlQ29tbWVudDQxOTE2NjkxNA==,514522,2018-09-06T16:56:44Z,2018-09-06T16:56:44Z,CONTRIBUTOR,"Thanks for the feedback!

1. You can count on indexing if the is_unique flag is checked beforehand. The way pandas does indexing seems to be both **clear** to the user and **powerful**. It seems **clear** because indexing is the result of a Cartesian product after filtering for matching values. It is **powerful** because it allows indexing as complex as SQL INNER JOIN, which covers the trivial case of unique elements. For example, the following operation

```python
import pandas as pd

df = pd.DataFrame(data=[0, 1, 2], index=list(""aab""))
print(df.loc[list(""ab"")])
#    0
# a  0
# a  1
# b  2
```

is an INNER JOIN between the two indexes

```
INNER((a, b) x (a, a, b)) = INNER(aa, aa, ab, ba, ba, bb)
                          = (aa, aa, bb)
```

Another example:

```python
import pandas as pd

df = pd.DataFrame(data=[0, 1], index=list(""aa""))
print(df.loc[list(""aa"")])
#    0
# a  0
# a  1
# a  0
# a  1
```

is again an INNER JOIN between the two indexes

```
INNER((a, a) x (a, a)) = INNER(aa, aa, aa, aa)
                       = (aa, aa, aa, aa)
```

2. Assume a bidimensional array with the following indexing:

```
  0 1
a ! @
a # $
```

**This translate into an unidimensional index:** `(a, 0), (a, 1), (a, 0), (a, 1)`. As such, it can be treated as usual. Assume you index the above matrix using `[('a', 0), ('a', 0)]`. This implies

```
INNER( ((a, 0), (a, 0)) x ((a, 0), (a, 1), (a, 0), (a, 1)) ) = INNER( (a,0)(a,0),
    (a,0)(a,1), (a,0)(a,0), (a,0)(a,1), (a,0)(a,0), (a,0)(a,1),
    (a,0)(a,0), (a,0)(a,1) )
  = ((a,0)(a,0), (a,0)(a,0), (a,0)(a,0), (a,0)(a,0))
```

Converting it back to the matricial representation:

```
  0 0
a ! !
a # #
```


In summary, my suggestion is to consider the possibility of defining indexing `B` by using `A` (i.e., `B.loc(A)`) as a Cartesian product followed by match filtering. Or in SQL terms, an INNER JOIN.

The multi-dimensional indexing, as far as I can see, can always be transformed into the uni-dimensional case and treated as such.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357156174