issue_comments: 233794061
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/906#issuecomment-233794061 | https://api.github.com/repos/pydata/xarray/issues/906 | 233794061 | MDEyOklzc3VlQ29tbWVudDIzMzc5NDA2MQ== | 6213168 | 2016-07-19T23:11:57Z | 2016-07-19T23:11:57Z | MEMBER | this workaround works:
However, I think that the whole thing is incredibly convoluted. Namely, because everything looks good both if you visualize the original pandas Series/DataFrame, as well as the stacked DataArray. unstack() is causing an internal technicality of pandas to produce real change in the data. I came through this issue because I am using pandas to load a multi-index CSV from disk, and then convert it to a n-dimensional xarray. In this situation, I have no control over the multiindex - short of manually rebuilding it after the CSV load. The pandas dataframe looks right, the stacked xarray looks right, the unstacked xarray gets magically sorted :$ Also I don't understand why you say there's no performance implications. You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy(). I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :( |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
166439490 |