issues: 365973662
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
365973662 | MDU6SXNzdWUzNjU5NzM2NjI= | 2459 | Stack + to_array before to_xarray is much faster that a simple to_xarray | 5635139 | closed | 0 | 13 | 2018-10-02T16:13:26Z | 2020-07-02T20:39:01Z | 2020-07-02T20:39:01Z | MEMBER | I was seeing some slow performance around To reproduce: Create a series with a MultiIndex, ensuring the MultiIndex isn't a simple product: ```python s = pd.Series( np.random.rand(100000), index=pd.MultiIndex.from_product([ list('abcdefhijk'), list('abcdefhijk'), pd.DatetimeIndex(start='2000-01-01', periods=1000, freq='B'), ])) cropped = s[::3] cropped.index=pd.MultiIndex.from_tuples(cropped.index, names=list('xyz')) cropped.head() x y za a 2000-01-03 0.9939892000-01-06 0.8505182000-01-11 0.0689442000-01-14 0.2371972000-01-19 0.784254dtype: float64``` Two approaches for getting this into xarray;
1 - Simple ```python current_method = cropped.to_xarray()<xarray.DataArray (x: 10, y: 10, z: 1000)> array([[[0.993989, nan, ..., nan, 0.721663], [ nan, nan, ..., 0.58224 , nan], ..., [ nan, 0.369382, ..., nan, nan], [0.98558 , nan, ..., nan, 0.403732]],
Coordinates: * x (x) object 'a' 'b' 'c' 'd' 'e' 'f' 'h' 'i' 'j' 'k' * y (y) object 'a' 'b' 'c' 'd' 'e' 'f' 'h' 'i' 'j' 'k' * z (z) datetime64[ns] 2000-01-03 2000-01-04 ... 2003-10-30 2003-10-31 ``` This takes 536 ms 2 - unstack in pandas first, and then use This takes 17.3 ms To confirm these are identical: ``` proposed_version_adj = ( proposed_version .assign_coords(y=proposed_version['y'].astype(object)) .transpose(*current_version.dims) ) proposed_version_adj.equals(current_version) True``` Problem descriptionA default operation is much slower than a (potentially) equivalent operation that's not the default. I need to look more at what's causing the issues. I think it's to do with the Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2459/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |