home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where comments = 2, "created_at" is on date 2020-12-29 and user = 14371165 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 1

state 1

  • closed 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
775875024 MDU6SXNzdWU3NzU4NzUwMjQ= 4739 Slow initilization of dataset.interp Illviljan 14371165 closed 0     2 2020-12-29T12:46:05Z 2021-05-05T12:26:01Z 2021-05-05T12:26:01Z MEMBER      

What happened: When interpolating a dataset with >2000 dask variables a lot of time is spent in da.unifying_chunks because da.unifying_chunks forces all variables and coordinates to a dask array. xarray on the other hand forces coordinates to pd.Index even if the coordinates was dask.array when the dataset was first created.

What you expected to happen: If the coords of the dataset was initialized as dask arrays they should stay lazy.

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np import dask.array as da

a = np.arange(0, 2000) b = np.core.defchararray.add("long_variable_name", a.astype(str)) coords = dict(time=da.array([0, 1])) data_vars = dict() for v in b: data_vars[v] = xr.DataArray( name=v, data=da.array([3, 4]), dims=["time"], coords=coords ) ds0 = xr.Dataset(data_vars) ds0 = ds0.interp( time=da.array([0, 0.5, 1]), assume_sorted=True, kwargs=dict(fill_value=None), ) ```

Anything else we need to know?: Some thoughts: * Why can't coordinates be lazy? * Can we use dask.dataframe.Index instead of pd.Index when creating IndexVariables? * There's no time saved converting to dask arrays in missing.interp_func. But some time could be saved if we could convert them to dask arrays in xr.Dataset.interp before the variable loop starts. * Can we still store the dask array in IndexVariable and use a to_dask_array()-method to quickly get it? * Initializing the dataarrays will still be slow though since it still has to force the dask array to pd.Index.

Environment:

Output of <tt>xr.show_versions()</tt> xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 libhdf5: 1.10.4 libnetcdf: None xarray: 0.16.2 pandas: 1.1.5 numpy: 1.17.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.1 numbagg: None pint: None setuptools: 51.0.0.post20201207 pip: 20.3.3 conda: 4.9.2 pytest: 6.2.1 IPython: 7.19.0 sphinx: 3.4.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4739/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 26.629ms · About: xarray-datasette