home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 174390114

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
174390114 MDU6SXNzdWUxNzQzOTAxMTQ= 995 xarray slicing is very slow, and reading time differs a lot between variables. 7747527 closed 0     2 2016-08-31T22:02:42Z 2016-09-01T03:24:49Z 2016-09-01T03:17:52Z NONE      

Hi everyone,

I've been working on calculating ocean heat transport across a prescribed section. Therefore, I need to calculate the V*T (velocity times T) on the grid points along that section.

Xarray has been the my favorite tool, ever since the day it was still xray. In my code, there is a loop (looping through grid points along the line) that determine if I need to use U (zonal velocity) or V (meridional velocity). Then times it by the temperature on that location, which is full depth.

VVEL, UVEL and TEMP have structure like this (time: 1, z_t: 42, lat: 1800, lon: 3600)

ds = xarray.open_dataset(filename, decode_times=False) vvel0=ds.VVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 uvel0=ds.UVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 temp0=ds.TEMP.sel(lat=slice(-60,-20),lon=slice(0,40))

The weird thing is, for VVEL and UVEL, it takes about 5sec with slicing, about 2 sec without slicing, but for TEMP, it only needs 6ms.

This then leads to another issue, within the loop, I need to extract a vertical column of temperature,

tt=np.squeeze(temp[:,yidx,xidx].values)

This line drags everything down, it takes about 4sec... but this is repeated for each loop (rouhgly 300 times). I found out that, removing .values reduces the time to 2ms, but I need to extract the values to calculate V*T.

What's more interesting is that, in the loop, I also have this line

vv=np.squeeze(vvel[:,yidx,xidx].values)

Which has the exact same structure as the line I used to call the column values of temperature. But this line only needs about 1ms, with and without calling .values


Could someone please explain how is this possible? All these variables are from the same netcdf file. And I called them in the exact same way. Also, why is slicing takes even longer then loading in the full field? Any suggestion I can circumvent this bottleneck? Thank you.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/995/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 160.931ms · About: xarray-datasette