issues: 775875024
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
775875024 | MDU6SXNzdWU3NzU4NzUwMjQ= | 4739 | Slow initilization of dataset.interp | 14371165 | closed | 0 | 2 | 2020-12-29T12:46:05Z | 2021-05-05T12:26:01Z | 2021-05-05T12:26:01Z | MEMBER | What happened:
When interpolating a dataset with >2000 dask variables a lot of time is spent in What you expected to happen: If the coords of the dataset was initialized as dask arrays they should stay lazy. Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np import dask.array as da a = np.arange(0, 2000) b = np.core.defchararray.add("long_variable_name", a.astype(str)) coords = dict(time=da.array([0, 1])) data_vars = dict() for v in b: data_vars[v] = xr.DataArray( name=v, data=da.array([3, 4]), dims=["time"], coords=coords ) ds0 = xr.Dataset(data_vars) ds0 = ds0.interp( time=da.array([0, 0.5, 1]), assume_sorted=True, kwargs=dict(fill_value=None), ) ``` Anything else we need to know?:
Some thoughts:
* Why can't coordinates be lazy?
* Can we use dask.dataframe.Index instead of pd.Index when creating IndexVariables?
* There's no time saved converting to dask arrays in Environment: Output of <tt>xr.show_versions()</tt>xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 libhdf5: 1.10.4 libnetcdf: None xarray: 0.16.2 pandas: 1.1.5 numpy: 1.17.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.1 numbagg: None pint: None setuptools: 51.0.0.post20201207 pip: 20.3.3 conda: 4.9.2 pytest: 6.2.1 IPython: 7.19.0 sphinx: 3.4.0 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4739/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |