html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/5545#issuecomment-874209908,https://api.github.com/repos/pydata/xarray/issues/5545,874209908,MDEyOklzc3VlQ29tbWVudDg3NDIwOTkwOA==,28786187,2021-07-05T15:58:00Z,2021-07-05T17:43:09Z,CONTRIBUTOR,"Hi,
@max-sixty I could give it a try, but my time is quite limited.
Would you be fine with a diff? That would save me a bit from setting up a fork and new repo.
Anyway, here is a quick diff, I tried to keep it small and basically moved the `max_rows` setting to `dataset_repr`, only `coords_repr` takes a new keyword argument, so that should be backwards compatible.
The tests would need to be updated. Maybe it is a good idea to not test `_mapping_repr`, but instead test `coords_repr`, `data_vars_repr`, `attrs_repr`, and `dataset_repr` separately, to check that they do what they are supposed to do regardless of their implementation?
Edit: Never mind, I am preparing a PR with updated tests.
```diff
diff --git a/xarray/core/formatting.py b/xarray/core/formatting.py
index 07864e81..ab30facf 100644
--- a/xarray/core/formatting.py
+++ b/xarray/core/formatting.py
@@ -377,14 +377,12 @@ def _mapping_repr(
):
if col_width is None:
col_width = _calculate_col_width(mapping)
- if max_rows is None:
- max_rows = OPTIONS[""display_max_rows""]
summary = [f""{title}:""]
if mapping:
len_mapping = len(mapping)
if not _get_boolean_with_default(expand_option_name, default=True):
summary = [f""{summary[0]} ({len_mapping})""]
- elif len_mapping > max_rows:
+ elif max_rows is not None and len_mapping > max_rows:
summary = [f""{summary[0]} ({max_rows}/{len_mapping})""]
first_rows = max_rows // 2 + max_rows % 2
items = list(mapping.items())
@@ -416,7 +414,7 @@ attrs_repr = functools.partial(
)
-def coords_repr(coords, col_width=None):
+def coords_repr(coords, col_width=None, max_rows=None):
if col_width is None:
col_width = _calculate_col_width(_get_col_items(coords))
return _mapping_repr(
@@ -425,6 +423,7 @@ def coords_repr(coords, col_width=None):
summarizer=summarize_coord,
expand_option_name=""display_expand_coords"",
col_width=col_width,
+ max_rows=max_rows,
)
@@ -542,21 +541,22 @@ def dataset_repr(ds):
summary = ["""".format(type(ds).__name__)]
col_width = _calculate_col_width(_get_col_items(ds.variables))
+ max_rows = OPTIONS[""display_max_rows""]
dims_start = pretty_print(""Dimensions:"", col_width)
summary.append(""{}({})"".format(dims_start, dim_summary(ds)))
if ds.coords:
- summary.append(coords_repr(ds.coords, col_width=col_width))
+ summary.append(coords_repr(ds.coords, col_width=col_width, max_rows=max_rows))
unindexed_dims_str = unindexed_dims_repr(ds.dims, ds.coords)
if unindexed_dims_str:
summary.append(unindexed_dims_str)
- summary.append(data_vars_repr(ds.data_vars, col_width=col_width))
+ summary.append(data_vars_repr(ds.data_vars, col_width=col_width, max_rows=max_rows))
if ds.attrs:
- summary.append(attrs_repr(ds.attrs))
+ summary.append(attrs_repr(ds.attrs, max_rows=max_rows))
return ""\n"".join(summary)
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-873193513,https://api.github.com/repos/pydata/xarray/issues/5545,873193513,MDEyOklzc3VlQ29tbWVudDg3MzE5MzUxMw==,28786187,2021-07-02T18:46:43Z,2021-07-02T18:46:43Z,CONTRIBUTOR,"@benbovy That sounds good to me. If I may add, I would leave `__repr__` and `__str__` to return the same things, since people seem to use them interchangeably, e.g. in tutorials, and probably in their own code and notebooks.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-872424026,https://api.github.com/repos/pydata/xarray/issues/5545,872424026,MDEyOklzc3VlQ29tbWVudDg3MjQyNDAyNg==,28786187,2021-07-01T17:26:23Z,2021-07-01T17:26:23Z,CONTRIBUTOR,"@max-sixty I apologize if I hurt someone, but it is hard to find a solution if we can't agree on the problem. Try the same examples with 50 or 100 instead of 2000 variables to understand what I mean.
And to be honest, I found your comments a bit dismissive and not exactly welcoming too, which is probably also not your intention.
From what I see in the examples by @Illviljan , setting `display_max_rows` affects everything equally, `coords`, `data_vars`, and `attrs`. So there would be no need to treat them separately. Or I misunderstood your comment.
Anyway, I think I made my point, I leave it up to you to decide what you are comfortable with.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-871674435,https://api.github.com/repos/pydata/xarray/issues/5545,871674435,MDEyOklzc3VlQ29tbWVudDg3MTY3NDQzNQ==,28786187,2021-06-30T19:36:26Z,2021-06-30T19:36:26Z,CONTRIBUTOR,"Hi @Illviljan,
As I mentioned earlier, your ""solution"" is not backwards compatible, and it would be counterproductive to update the doctest. Which is also not relevant here and a different issue.
I am not sure what you are trying to show, your datasets look very different from what I am working with, and they miss the point. Then again they also prove my point, `pandas` and `numpy` shorten in a canonical way (except the finite number of columns, which may make sense, but I don't like that either and would rather have it wrap but show all columns). `xarray` doesn't because usually the variables are not simply numbered as in your example.
I am talking about medium sized datasets of a few 10 to maybe a few 100 non-canonical data variables. Have a look at http://cfconventions.org/ to get an impression of real-world variable names, or the example linked above in comment https://github.com/pydata/xarray/issues/5545#issuecomment-870109486.
There it would be nice to have an overview over *all* of them.
If too many variables are a problem, imo it would have been better to say:
""We keep it as it is, however, if it is a problem for your large dataset, here is an option to reduce the amount of output: ..."" And put that into the docs or the wiki or FAQ or something similar.
Note that the initial point in the linked issue is about the *time* it takes to print all variables, not the *amount* that gets shown. And usually the number of coordinates and attributes is smaller than the number of data variables.
It also depends on what you call ""screen"", my terminal has currently 48 lines (about 56 in fullscreen, depending on fontsize), and a scrollback buffer of 5000 lines, I am also used to scrolling through long jupyter notebooks. Scrolling through your examples might be tedious (not for me actually), but I will never be able to find typos hidden in the three dots.
@max-sixty No worries, I understand that this is a minor cosmetic issue, actually I intended it as a feature request, not a bug. But that must have gone missing along the way.
I guess I could live with 50, any other opinions? I am sure someone else will complain about that too. ;)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-870396123,https://api.github.com/repos/pydata/xarray/issues/5545,870396123,MDEyOklzc3VlQ29tbWVudDg3MDM5NjEyMw==,28786187,2021-06-29T08:36:04Z,2021-06-29T08:36:04Z,CONTRIBUTOR,"Hi @max-sixty
> We need to cut some of the output, given a dataset has arbitrary size — same as numpy arrays / pandas dataframes.
I thought about that too, but I believe these cases are slightly different. In numpy arrays you can almost guess how the full array looks like, you know the shape and get an impression of the magnitude of the entries (of course there can be exceptions which are not shown in the output). Similar for pandas series or dataframes, the skipped index values are quite easy to guess. The names of data variables in a dataset are almost impossible to guess, as are their dimensions and data types. The ellipsis is usually used to indicate some kind of continuation, which is not really the case with the data variables.
> If people feel strongly about a default > 12, that seems reasonable. Do people?
I can't speak for other people, but I do, sorry about that. @shoyer 's suggestion sounds good to me, from the top of my head 30-100 variables in a dataset seems to be around what I have come across as a typical case. Which does not mean that it *is* the typical case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-869950924,https://api.github.com/repos/pydata/xarray/issues/5545,869950924,MDEyOklzc3VlQ29tbWVudDg2OTk1MDkyNA==,28786187,2021-06-28T19:12:43Z,2021-06-28T19:12:43Z,CONTRIBUTOR,"I switched off html rendering altogether because that *really* slows down the browser, haven't had any problems with the text output. The text output is (was) also much more concise and does not require additional clicks to open the dataset and see which variables are in there.
The problem with your suggestion is that this approach is not backwards compatible, which is not nice towards long-term users. A larger default would be a bit like meeting half-way. I also respectfully disagree about the purpose of `__repr__()`, see for example https://docs.python.org/3/reference/datamodel.html#object.__repr__ .
Cutting the output arbitrarily does not allow one to ""recreate the object"".","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247
https://github.com/pydata/xarray/issues/5545#issuecomment-869726359,https://api.github.com/repos/pydata/xarray/issues/5545,869726359,MDEyOklzc3VlQ29tbWVudDg2OTcyNjM1OQ==,28786187,2021-06-28T14:19:01Z,2021-06-28T14:19:01Z,CONTRIBUTOR,"Why not increase that number to a more sensible value (as I suggested), or make it optional if people have problems?
If people are concerned and have problems, then this would be an option to fix that, not the other way around. This enforces such a low limit onto all others.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,931591247