issues: 205215815
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
205215815 | MDU6SXNzdWUyMDUyMTU4MTU= | 1247 | numpy function very slow on DataArray compared to DataArray.values | 731499 | closed | 0 | 5 | 2017-02-03T17:12:08Z | 2019-01-23T17:34:22Z | 2019-01-23T17:34:22Z | CONTRIBUTOR | First I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those. ```python !/usr/bin/env pythonimport xarray as xr import numpy as np lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon}) latbins = np.r_[-90:90:2.] lonbins = np.r_[-180:180:2.] h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins)) ``` When I run this I get some underwhelming performance: ```
real 0m28.152s user 0m27.201s sys 0m0.630s ``` If I change the last line to
(i.e. I pass the numpy arrays directly to the histogram2d function), things are very different: ```
real 0m0.996s user 0m0.569s sys 0m0.253s ``` It's ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1247/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |