html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4789#issuecomment-767186998,https://api.github.com/repos/pydata/xarray/issues/4789,767186998,MDEyOklzc3VlQ29tbWVudDc2NzE4Njk5OA==,5635139,2021-01-25T23:50:56Z,2021-01-25T23:50:56Z,MEMBER,"Yes great, I think that would be a great cut-through solution!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,782943813
https://github.com/pydata/xarray/issues/4789#issuecomment-766518276,https://api.github.com/repos/pydata/xarray/issues/4789,766518276,MDEyOklzc3VlQ29tbWVudDc2NjUxODI3Ng==,5635139,2021-01-25T03:36:23Z,2021-01-25T03:36:23Z,MEMBER,"The rabbit hole went deeper than I expected. I need to sign off now, but leaving what I have in case someone else has some insight.
Essentially, we call `get_level_variable` on the coord in `formatting.py`, which calls `get_level_values` into pandas. This is really slow on large MultiIndexes! I think it's recreating the whole index. I got as deep as `algos.take_1d`.
I think we can probably do something smarter to only call this on the first & last items in the MultiIndex.
For reference, here's the output of line_profiler, a good profiler for figuring this sort of thing out:
```
%lprun -f formatting._summarize_coord_levels -f IndexVariable.get_level_variable -f pd.MultiIndex.get_level_values -f pd.MultiIndex._get_level_values coords_repr(da.coords)
Total time: 1.91029 s
File: /Users/maximilian/workspace/xarray/xarray/core/formatting.py
Function: _summarize_coord_levels at line 302
Line # Hits Time Per Hit % Time Line Contents
==============================================================
302 def _summarize_coord_levels(coord, col_width, marker=""-""):
303 2 1910185.0 955092.5 100.0 return ""\n"".join(
304 summarize_variable(
305 lname, coord.get_level_variable(lname), col_width, marker=marker
306 )
307 1 102.0 102.0 0.0 for lname in coord.level_names
308 )
Total time: 1.81777 s
File: /Users/maximilian/workspace/xarray/xarray/core/variable.py
Function: get_level_variable at line 2687
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2687 def get_level_variable(self, level):
2688 """"""Return a new IndexVariable from a given MultiIndex level.""""""
2689 2 303.0 151.5 0.0 if self.level_names is None:
2690 raise ValueError(""IndexVariable %r has no MultiIndex"" % self.name)
2691 2 216.0 108.0 0.0 index = self.to_index()
2692 2 1817254.0 908627.0 100.0 return type(self)(self.dims, index.get_level_values(level))
Total time: 1.81709 s
File: /usr/local/lib/python3.9/site-packages/pandas/core/indexes/multi.py
Function: _get_level_values at line 1617
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1617 def _get_level_values(self, level, unique=False):
1618 """"""
1619 Return vector of label values for requested level,
1620 equal to the length of the index
1621
1622 **this is an internal method**
1623
1624 Parameters
1625 ----------
1626 level : int level
1627 unique : bool, default False
1628 if True, drop duplicated values
1629
1630 Returns
1631 -------
1632 values : ndarray
1633 """"""
1634 2 47.0 23.5 0.0 lev = self.levels[level]
1635 2 5.0 2.5 0.0 level_codes = self.codes[level]
1636 2 2.0 1.0 0.0 name = self._names[level]
1637 2 1.0 0.5 0.0 if unique:
1638 level_codes = algos.unique(level_codes)
1639 2 1816971.0 908485.5 100.0 filled = algos.take_1d(lev._values, level_codes, fill_value=lev._na_value)
1640 2 60.0 30.0 0.0 return lev._shallow_copy(filled, name=name)
Total time: 1.81712 s
File: /usr/local/lib/python3.9/site-packages/pandas/core/indexes/multi.py
Function: get_level_values at line 1642
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1642 def get_level_values(self, level):
1643 """"""
1644 Return vector of label values for requested level.
1645
1646 Length of returned vector is equal to the length of the index.
1647
1648 Parameters
1649 ----------
1650 level : int or str
1651 ``level`` is either the integer position of the level in the
1652 MultiIndex, or the name of the level.
1653
1654 Returns
1655 -------
1656 values : Index
1657 Values is a level of this MultiIndex converted to
1658 a single :class:`Index` (or subclass thereof).
1659
1660 Examples
1661 --------
1662 Create a MultiIndex:
1663
1664 >>> mi = pd.MultiIndex.from_arrays((list('abc'), list('def')))
1665 >>> mi.names = ['level_1', 'level_2']
1666
1667 Get level values by supplying level as either integer or name:
1668
1669 >>> mi.get_level_values(0)
1670 Index(['a', 'b', 'c'], dtype='object', name='level_1')
1671 >>> mi.get_level_values('level_2')
1672 Index(['d', 'e', 'f'], dtype='object', name='level_2')
1673 """"""
1674 2 11.0 5.5 0.0 level = self._get_level_number(level)
1675 2 1817107.0 908553.5 100.0 values = self._get_level_values(level)
1676 2 2.0 1.0 0.0 return values
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,782943813
https://github.com/pydata/xarray/issues/4789#issuecomment-766504338,https://api.github.com/repos/pydata/xarray/issues/4789,766504338,MDEyOklzc3VlQ29tbWVudDc2NjUwNDMzOA==,5635139,2021-01-25T02:46:40Z,2021-01-25T02:46:40Z,MEMBER,"One quick observation is that it's related to the MultiIndex — if we swap out the index for `idx = pd.Index(range(100_000_000))`, the time drops from 1.8s to 812mics","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,782943813