home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 839914235

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
839914235 MDU6SXNzdWU4Mzk5MTQyMzU= 5071 Applying a function on a subset of variables using `map_blocks` is much slower 22245117 closed 0     1 2021-03-24T16:39:08Z 2021-03-29T17:45:58Z 2021-03-29T17:45:58Z CONTRIBUTOR      

What happened: Looks like when I use map_blocks with functions that operate on a subset of variables, the computation is much faster if I subsample the dataset first.

What you expected to happen: In the example below, I wouldn't expect such a difference in computation time.

Minimal Complete Verifiable Example:

python import xarray as xr ds = xr.tutorial.open_dataset("air_temperature") ds["foo"] = xr.DataArray(ds["time"].values, dims="time") ds = ds.chunk(dict(time=1, lon=-1, lat=-1))

python def func(obj): return obj[["foo"]]

python %%timeit ds.map_blocks(func).compute()

17.3 s ± 179 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```python %%timeit

Subsample the dataset before calling map_blocks

func(ds).map_blocks(func).compute() ```

5.63 s ± 175 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.18.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.20.1 scipy: 1.6.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.03.0 distributed: 2021.03.0 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.2 IPython: 7.21.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5071/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.832ms · About: xarray-datasette