home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 782943813

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
782943813 MDU6SXNzdWU3ODI5NDM4MTM= 4789 Poor performance of repr of large arrays, particularly jupyter repr 5635139 closed 0     5 2021-01-11T00:28:24Z 2021-01-29T23:05:58Z 2021-01-29T23:05:58Z MEMBER      

What happened:

The _repr_html_ method of large arrays seems very slow — 4.78s in the case of a 100m value array; and the general repr seems fairly slow — 1.87s. Here's a quick example. I haven't yet investigated how dependent it is on there being a MultiIndex.

What you expected to happen:

We should really focus on having good repr performance, given how essential it is to any REPL workflow.

Minimal Complete Verifiable Example:

```python In [10]: import xarray as xr ...: import numpy as np ...: import pandas as pd

In [11]: idx = pd.MultiIndex.from_product([range(10_000), range(10_000)])

In [12]: df = pd.DataFrame(range(100_000_000), index=idx)

In [13]: da = xr.DataArray(df)

In [14]: da Out[14]: <xarray.DataArray (dim_0: 100000000, dim_1: 1)> array([[ 0], [ 1], [ 2], ..., [99999997], [99999998], [99999999]]) Coordinates: * dim_0 (dim_0) MultiIndex - dim_0_level_0 (dim_0) int64 0 0 0 0 0 0 0 ... 9999 9999 9999 9999 9999 9999 - dim_0_level_1 (dim_0) int64 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999 * dim_1 (dim_1) int64 0

In [26]: %timeit repr(da) 1.87 s ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [27]: %timeit da.repr_html() 4.78 s ± 1.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.7 (default, Dec 30 2020, 10:13:08) [Clang 12.0.0 (clang-1200.0.32.28)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.3.dev48+gbf0fe2ca pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.30.0 distributed: None matplotlib: 3.3.2 cartopy: None seaborn: 0.11.0 numbagg: installed pint: 0.16.1 setuptools: 51.1.1 pip: 20.3.3 conda: None pytest: 6.1.1 IPython: 7.19.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 79.91ms · About: xarray-datasette