id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 776595030,MDExOlB1bGxSZXF1ZXN0NTQ3MDUzOTM5,4744,Speed up Dataset._construct_dataarray,2448579,closed,0,,,1,2020-12-30T19:03:05Z,2021-01-05T17:32:16Z,2021-01-05T17:32:13Z,MEMBER,,0,pydata/xarray/pulls/4744," - [ ] Tests added - [x] Passes `isort . && black . && mypy . && flake8` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` Significantly speeds up `_construct_dataarray` by iterating over `._coord_names` instead of `.coords`. This avoids unnecessarily constructing a `DatasetCoordinates` object and massively speeds up repr construction for datasets with large numbers of variables. Construct a 2000 variable dataset ```python import numpy as np import xarray as xr a = np.arange(0, 2000) b = np.core.defchararray.add(""long_variable_name"", a.astype(str)) coords = dict(time=np.array([0, 1])) data_vars = dict() for v in b: data_vars[v] = xr.DataArray( name=v, data=np.array([3, 4]), dims=[""time""], coords=coords ) ds0 = xr.Dataset(data_vars) ``` Before: ``` %timeit ds0['long_variable_name1999'] %timeit ds0.__repr__() 1.33 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.66 s ± 52.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After: ``` %timeit ds0['long_variable_name1999'] %timeit ds0.__repr__() 10.5 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 84.2 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4744/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull