html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/906#issuecomment-269507466,https://api.github.com/repos/pydata/xarray/issues/906,269507466,MDEyOklzc3VlQ29tbWVudDI2OTUwNzQ2Ng==,1217238,2016-12-28T17:09:23Z,2016-12-28T17:09:23Z,MEMBER,"@crusaderky can you raise the issue again on the pandas issue tracker (see my comment in https://github.com/pandas-dev/pandas/issues/14903#issuecomment-267779151)? If need be, we can change this separately, but all things being equal I would prefer to keep `unstack()` consistent between pandas and xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490 https://github.com/pydata/xarray/issues/906#issuecomment-234686759,https://api.github.com/repos/pydata/xarray/issues/906,234686759,MDEyOklzc3VlQ29tbWVudDIzNDY4Njc1OQ==,1217238,2016-07-23T00:24:17Z,2016-07-23T00:24:17Z,MEMBER,"@crusaderky gist.github.com will render ipynb files, which makes them much easier to view! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490 https://github.com/pydata/xarray/issues/906#issuecomment-233994941,https://api.github.com/repos/pydata/xarray/issues/906,233994941,MDEyOklzc3VlQ29tbWVudDIzMzk5NDk0MQ==,1217238,2016-07-20T15:58:15Z,2016-07-20T15:58:15Z,MEMBER,"Here are two examples where we would need to do pick-by-index on the data no matter what: ``` python def demo_unstack(index): index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(len(index))), index) print(s.unstack()) ``` There is no order for one or more of the levels would be sorted: ``` python demo_unstack([ ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x1', 'third' ], ['x1', 'second'], ['x1', 'first' ], ]) ``` ``` count first second third x x0 0 1 2 x1 5 4 3 In [ ]: ``` Even more pathological: the multi-index doesn't even fill out every value in the cartesian product: ``` python demo_unstack([ ['x1', 'first' ], ['x1', 'second'], ['x0', 'first' ], ]) ``` ``` count first second x x0 2.0 NaN x1 0.0 1.0 ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490 https://github.com/pydata/xarray/issues/906#issuecomment-233797167,https://api.github.com/repos/pydata/xarray/issues/906,233797167,MDEyOklzc3VlQ29tbWVudDIzMzc5NzE2Nw==,1217238,2016-07-19T23:29:57Z,2016-07-19T23:29:57Z,MEMBER,"> You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy(). This is true, but in the worst case (e.g., random order for the MultiIndex) we'll have this issue no matter what rule we pick for assigning unstacked coordinates. > I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :( MultiIndex _should_ work with dask -- we have a few tests for this. If not, a bug report would be appreciated! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490 https://github.com/pydata/xarray/issues/906#issuecomment-233796557,https://api.github.com/repos/pydata/xarray/issues/906,233796557,MDEyOklzc3VlQ29tbWVudDIzMzc5NjU1Nw==,1217238,2016-07-19T23:26:33Z,2016-07-19T23:26:33Z,MEMBER,"What behavior would you suggest as an alternative? I suppose that in principle we could assign new levels based on order of appearance (and treat `levels` as an implementation detail), but it's worth noting that this behavior for `unstack()` matches how pandas works: ``` >>> s.unstack() count first fourth second third x x0 4 7 5 6 x1 0 3 1 2 ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490 https://github.com/pydata/xarray/issues/906#issuecomment-233776163,https://api.github.com/repos/pydata/xarray/issues/906,233776163,MDEyOklzc3VlQ29tbWVudDIzMzc3NjE2Mw==,1217238,2016-07-19T21:45:33Z,2016-07-19T21:45:33Z,MEMBER,"`unstack` sorts the data [by the order of labels](https://github.com/pydata/xarray/blob/7a9e84b5708d3e8ec270a7415f9b5e54d30f13f7/xarray/core/dataset.py#L1417) on the `levels` attribute on the MultiIndex. We don't calculate the order when calling `unstack`, so there shouldn't be any performance concerns on this side. By default, pandas.MultiIndex creates each level in `levels` in sorted order, which is sometimes necessary to ensure indexing (especially slicing) works properly. But if you like, you can control this explicitly by using the [MultiIndex constructor](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.MultiIndex.html) directly, e.g., `index = pandas.MultiIndex(levels, labels)`. Does that solve your use case here? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,166439490