html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/906#issuecomment-457183389,https://api.github.com/repos/pydata/xarray/issues/906,457183389,MDEyOklzc3VlQ29tbWVudDQ1NzE4MzM4OQ==,26384082,2019-01-24T12:43:22Z,2019-01-24T12:43:22Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-269507466,https://api.github.com/repos/pydata/xarray/issues/906,269507466,MDEyOklzc3VlQ29tbWVudDI2OTUwNzQ2Ng==,1217238,2016-12-28T17:09:23Z,2016-12-28T17:09:23Z,MEMBER,"@crusaderky can you raise the issue again on the pandas issue tracker (see my comment in https://github.com/pandas-dev/pandas/issues/14903#issuecomment-267779151)? If need be, we can change this separately, but all things being equal I would prefer to keep `unstack()` consistent between pandas and xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-269479071,https://api.github.com/repos/pydata/xarray/issues/906,269479071,MDEyOklzc3VlQ29tbWVudDI2OTQ3OTA3MQ==,6213168,2016-12-28T13:46:19Z,2016-12-28T13:46:19Z,MEMBER,"@shoyer, are you happy for me to go ahead and change unstack() to respect the order of the first found series?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-234687071,https://api.github.com/repos/pydata/xarray/issues/906,234687071,MDEyOklzc3VlQ29tbWVudDIzNDY4NzA3MQ==,6213168,2016-07-23T00:27:49Z,2016-07-23T00:27:49Z,MEMBER,"Thanks, didn't know

https://gist.github.com/crusaderky/002ba64ee270164931d32ea3366dce1f
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-234686759,https://api.github.com/repos/pydata/xarray/issues/906,234686759,MDEyOklzc3VlQ29tbWVudDIzNDY4Njc1OQ==,1217238,2016-07-23T00:24:17Z,2016-07-23T00:24:17Z,MEMBER,"@crusaderky gist.github.com will render ipynb files, which makes them much easier to view!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-234686438,https://api.github.com/repos/pydata/xarray/issues/906,234686438,MDEyOklzc3VlQ29tbWVudDIzNDY4NjQzOA==,6213168,2016-07-23T00:20:41Z,2016-07-23T00:20:41Z,MEMBER,"Fixed in attachment. The code uses the first found series as the order.

[proper_unstack.zip](https://github.com/pydata/xarray/files/379272/proper_unstack.zip)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-234004910,https://api.github.com/repos/pydata/xarray/issues/906,234004910,MDEyOklzc3VlQ29tbWVudDIzNDAwNDkxMA==,6213168,2016-07-20T16:33:15Z,2016-07-20T16:33:15Z,MEMBER,"I see. I'll see if I can think a good way to cope with your two examples.
BTW, my code above is buggy as it blindly assumes that the first dim is also the outermost.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233994941,https://api.github.com/repos/pydata/xarray/issues/906,233994941,MDEyOklzc3VlQ29tbWVudDIzMzk5NDk0MQ==,1217238,2016-07-20T15:58:15Z,2016-07-20T15:58:15Z,MEMBER,"Here are two examples where we would need to do pick-by-index on the data no matter what:

``` python
def demo_unstack(index):
    index = pandas.MultiIndex.from_tuples(index, names=['x', 'count'])
    s = pandas.Series(list(range(len(index))), index)
    print(s.unstack())
```

There is no order for one or more of the levels would be sorted:

``` python
demo_unstack([
    ['x0', 'first' ],
    ['x0', 'second'],
    ['x0', 'third' ],
    ['x1', 'third' ],
    ['x1', 'second'],
    ['x1', 'first' ],
])
```

```
count  first  second  third
x                          
x0         0       1      2
x1         5       4      3
In [ ]:
```

Even more pathological: the multi-index doesn't even fill out every value in the cartesian product:

``` python
demo_unstack([
    ['x1', 'first' ],
    ['x1', 'second'],
    ['x0', 'first' ],
])
```

```
count  first  second
x                   
x0       2.0     NaN
x1       0.0     1.0
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233904555,https://api.github.com/repos/pydata/xarray/issues/906,233904555,MDEyOklzc3VlQ29tbWVudDIzMzkwNDU1NQ==,6213168,2016-07-20T09:52:42Z,2016-07-20T09:52:42Z,MEMBER,"This preamble should be integrated inside unstack():

``` python
import operator
from functools import reduce

def proper_unstack(array, dim):

    # Regenerate Pandas multi-index to be ordered by appearance
    # TODO: check that the stacked coords repeat periodically
    # TODO: write a faster/cleaner algorithm using numpy
    mindex = array.coords[dim].to_pandas().index

    levels = []
    labels = []
    for dim_i, (levels_i, labels_i) in enumerate(zip(mindex.levels, mindex.labels)):
        step_inner = reduce(operator.mul, (len(lvl) for lvl in mindex.levels[dim_i + 1:]), 1)
        step_outer = reduce(operator.mul, (len(lvl) for lvl in mindex.levels[:dim_i]), 1)

        levels.append([levels_i[labels_i[j]] for j in range(0, levels_i.size * step_inner, step_inner)])
        labels.append(reduce(operator.add, ([j] * step_inner for j in range(levels_i.size))) * step_outer)

    mindex = pandas.MultiIndex(levels, labels, names=mindex.names)
    array = array.copy()
    array.coords[dim] = mindex
    return array.unstack(dim)


proper_unstack(a, 'dim_0')
```

```
<xarray.DataArray (x: 2, count: 4)>
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
Coordinates:
  * x        (x) object 'x1' 'x0'
  * count    (count) object 'first' 'second' 'third' 'fourth'
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233888081,https://api.github.com/repos/pydata/xarray/issues/906,233888081,MDEyOklzc3VlQ29tbWVudDIzMzg4ODA4MQ==,6213168,2016-07-20T08:42:19Z,2016-07-20T08:42:19Z,MEMBER,"the order of appearance should be what dictates the output.

> > in the worst case (e.g., random order for the MultiIndex) we'll have this issue no matter what rule we pick for assigning unstacked coordinates.

Not true. Using the order of appearance requires you to do a pick-by-index on the _index_. At the moment, you're doing a pick-by-index on the _data_.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233797167,https://api.github.com/repos/pydata/xarray/issues/906,233797167,MDEyOklzc3VlQ29tbWVudDIzMzc5NzE2Nw==,1217238,2016-07-19T23:29:57Z,2016-07-19T23:29:57Z,MEMBER,"> You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy().

This is true, but in the worst case (e.g., random order for the MultiIndex) we'll have this issue no matter what rule we pick for assigning unstacked coordinates.

> I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :(

MultiIndex _should_ work with dask -- we have a few tests for this. If not, a bug report would be appreciated!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233796557,https://api.github.com/repos/pydata/xarray/issues/906,233796557,MDEyOklzc3VlQ29tbWVudDIzMzc5NjU1Nw==,1217238,2016-07-19T23:26:33Z,2016-07-19T23:26:33Z,MEMBER,"What behavior would you suggest as an alternative? I suppose that in principle we could assign new levels based on order of appearance (and treat `levels` as an implementation detail), but it's worth noting that this behavior for `unstack()` matches how pandas works:

```
>>> s.unstack()
count  first  fourth  second  third
x                                  
x0         4       7       5      6
x1         0       3       1      2
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233794061,https://api.github.com/repos/pydata/xarray/issues/906,233794061,MDEyOklzc3VlQ29tbWVudDIzMzc5NDA2MQ==,6213168,2016-07-19T23:11:57Z,2016-07-19T23:11:57Z,MEMBER,"this workaround works:

``` python
index2 = pandas.MultiIndex(
    levels=[['x0', 'x1'], ['first', 'second', 'third', 'fourth']],
    labels=[[0,0,0,0,1,1,1,1], [0,1,2,3,0,1,2,3]],
    names=['x', 'count'])
xarray.DataArray(pandas.Series(list(range(8)), index2)).unstack('dim_0')
```

```
<xarray.DataArray (x: 2, count: 4)>
array([[0, 1, 2, 3],
       [4, 5, 6, 7]], dtype=int64)
Coordinates:
  * x        (x) object 'x0' 'x1'
  * count    (count) object 'first' 'second' 'third' 'fourth'
```

However, I think that the whole thing is incredibly convoluted. Namely, because everything _looks_ good both if you visualize the original pandas Series/DataFrame, as well as the stacked DataArray. unstack() is causing an internal technicality of pandas to produce real change in the data.

I came through this issue because I am using pandas to load a multi-index CSV from disk, and then convert it to a n-dimensional xarray. In this situation, I have no control over the multiindex - short of manually rebuilding it after the CSV load. The pandas dataframe _looks_ right, the stacked xarray _looks_ right, the unstacked xarray gets magically sorted :$

Also I don't understand why you say there's no performance implications.
You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy().

I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :(
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490
https://github.com/pydata/xarray/issues/906#issuecomment-233776163,https://api.github.com/repos/pydata/xarray/issues/906,233776163,MDEyOklzc3VlQ29tbWVudDIzMzc3NjE2Mw==,1217238,2016-07-19T21:45:33Z,2016-07-19T21:45:33Z,MEMBER,"`unstack` sorts the data [by the order of labels](https://github.com/pydata/xarray/blob/7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7/xarray/core/dataset.py#L1417) on the `levels` attribute on the MultiIndex. We don't calculate the order when calling `unstack`, so there shouldn't be any performance concerns on this side.

By default, pandas.MultiIndex creates each level in `levels` in sorted order, which is sometimes necessary to ensure indexing (especially slicing) works properly. But if you like, you can control this explicitly by using the [MultiIndex constructor](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.MultiIndex.html) directly, e.g., `index = pandas.MultiIndex(levels, labels)`. Does that solve your use case here?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490