github: issue_comments: 5 rows where issue = 782943813 sorted by updated

5 rows where issue = 782943813 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
766983346	https://github.com/pydata/xarray/issues/4789#issuecomment-766983346	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc2Njk4MzM0Ng==	keewis 14808389	2021-01-25T17:33:19Z	2021-01-26T21:59:20Z	MEMBER	that seems to be the main issue. With ```diff diff --git a/xarray/core/formatting.py b/xarray/core/formatting.py index 282620e3..f825ed85 100644 --- a/xarray/core/formatting.py +++ b/xarray/core/formatting.py @@ -300,9 +300,11 @@ def _summarize_coord_multiindex(coord, col_width, marker): def _summarize_coord_levels(coord, col_width, marker="-"): + indices = list(range(10)) + list(range(-10, 0)) + subset = coord[indices] return "\n".join( summarize_variable( - lname, coord.get_level_variable(lname), col_width, marker=marker + lname, subset.get_level_variable(lname), col_width, marker=marker ) for lname in coord.level_names ) `` I get a speed up of about 180x (forxr.DataArray(pd.Series(25_000_000, index=idx))`, not sure if the speed-up is as significant for bigger arrays). We should probably make the shape of`indices`depend on`col_width`, though.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813
767186998	https://github.com/pydata/xarray/issues/4789#issuecomment-767186998	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc2NzE4Njk5OA==	max-sixty 5635139	2021-01-25T23:50:56Z	2021-01-25T23:50:56Z	MEMBER	Yes great, I think that would be a great cut-through solution!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813
766518276	https://github.com/pydata/xarray/issues/4789#issuecomment-766518276	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc2NjUxODI3Ng==	max-sixty 5635139	2021-01-25T03:36:23Z	2021-01-25T03:36:23Z	MEMBER	The rabbit hole went deeper than I expected. I need to sign off now, but leaving what I have in case someone else has some insight. Essentially, we call `get_level_variable` on the coord in `formatting.py`, which calls `get_level_values` into pandas. This is really slow on large MultiIndexes! I think it's recreating the whole index. I got as deep as `algos.take_1d`. I think we can probably do something smarter to only call this on the first & last items in the MultiIndex. For reference, here's the output of line_profiler, a good profiler for figuring this sort of thing out: ``` %lprun -f formatting._summarize_coord_levels -f IndexVariable.get_level_variable -f pd.MultiIndex.get_level_values -f pd.MultiIndex._get_level_values coords_repr(da.coords) Total time: 1.91029 s File: /Users/maximilian/workspace/xarray/xarray/core/formatting.py Function: _summarize_coord_levels at line 302 Line # Hits Time Per Hit % Time Line Contents 302 def _summarize_coord_levels(coord, col_width, marker="-"): 303 2 1910185.0 955092.5 100.0 return "\n".join( 304 summarize_variable( 305 lname, coord.get_level_variable(lname), col_width, marker=marker 306 ) 307 1 102.0 102.0 0.0 for lname in coord.level_names 308 ) Total time: 1.81777 s File: /Users/maximilian/workspace/xarray/xarray/core/variable.py Function: get_level_variable at line 2687 Line # Hits Time Per Hit % Time Line Contents 2687 def get_level_variable(self, level): 2688 """Return a new IndexVariable from a given MultiIndex level.""" 2689 2 303.0 151.5 0.0 if self.level_names is None: 2690 raise ValueError("IndexVariable %r has no MultiIndex" % self.name) 2691 2 216.0 108.0 0.0 index = self.to_index() 2692 2 1817254.0 908627.0 100.0 return type(self)(self.dims, index.get_level_values(level)) Total time: 1.81709 s File: /usr/local/lib/python3.9/site-packages/pandas/core/indexes/multi.py Function: _get_level_values at line 1617 Line # Hits Time Per Hit % Time Line Contents 1617 def _get_level_values(self, level, unique=False): 1618 """ 1619 Return vector of label values for requested level, 1620 equal to the length of the index 1621 1622 this is an internal method 1623 1624 Parameters 1625 ---------- 1626 level : int level 1627 unique : bool, default False 1628 if True, drop duplicated values 1629 1630 Returns 1631 ------- 1632 values : ndarray 1633 """ 1634 2 47.0 23.5 0.0 lev = self.levels[level] 1635 2 5.0 2.5 0.0 level_codes = self.codes[level] 1636 2 2.0 1.0 0.0 name = self._names[level] 1637 2 1.0 0.5 0.0 if unique: 1638 level_codes = algos.unique(level_codes) 1639 2 1816971.0 908485.5 100.0 filled = algos.take_1d(lev._values, level_codes, fill_value=lev._na_value) 1640 2 60.0 30.0 0.0 return lev._shallow_copy(filled, name=name) Total time: 1.81712 s File: /usr/local/lib/python3.9/site-packages/pandas/core/indexes/multi.py Function: get_level_values at line 1642 Line # Hits Time Per Hit % Time Line Contents 1642 def get_level_values(self, level): 1643 """ 1644 Return vector of label values for requested level. 1645 1646 Length of returned vector is equal to the length of the index. 1647 1648 Parameters 1649 ---------- 1650 level : int or str 1651 `level` is either the integer position of the level in the 1652 MultiIndex, or the name of the level. 1653 1654 Returns 1655 ------- 1656 values : Index 1657 Values is a level of this MultiIndex converted to 1658 a single :class:`Index` (or subclass thereof). 1659 1660 Examples 1661 -------- 1662 Create a MultiIndex: 1663 1664 >>> mi = pd.MultiIndex.from_arrays((list('abc'), list('def'))) 1665 >>> mi.names = ['level_1', 'level_2'] 1666 1667 Get level values by supplying level as either integer or name: 1668 1669 >>> mi.get_level_values(0) 1670 Index(['a', 'b', 'c'], dtype='object', name='level_1') 1671 >>> mi.get_level_values('level_2') 1672 Index(['d', 'e', 'f'], dtype='object', name='level_2') 1673 """ 1674 2 11.0 5.5 0.0 level = self._get_level_number(level) 1675 2 1817107.0 908553.5 100.0 values = self._get_level_values(level) 1676 2 2.0 1.0 0.0 return values ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813
766504338	https://github.com/pydata/xarray/issues/4789#issuecomment-766504338	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc2NjUwNDMzOA==	max-sixty 5635139	2021-01-25T02:46:40Z	2021-01-25T02:46:40Z	MEMBER	One quick observation is that it's related to the MultiIndex — if we swap out the index for `idx = pd.Index(range(100_000_000))`, the time drops from 1.8s to 812mics	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813
758373462	https://github.com/pydata/xarray/issues/4789#issuecomment-758373462	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc1ODM3MzQ2Mg==	rabernat 1197350	2021-01-12T03:36:26Z	2021-01-12T03:36:26Z	MEMBER	I uncovered this issue with Dask's SVG in its `_repr_html` function: https://github.com/dask/dask/issues/6670. The fix made a big difference in repr size. Possibly related?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where issue = 782943813 sorted by updated_at descending

Line # Hits Time Per Hit % Time Line Contents

Line # Hits Time Per Hit % Time Line Contents

Line # Hits Time Per Hit % Time Line Contents

Line # Hits Time Per Hit % Time Line Contents

Advanced export