home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

1 row where repo = 13221727, type = "issue" and user = 7747527 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 1 ✖

state 1

  • closed 1

repo 1

  • xarray · 1 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
174390114 MDU6SXNzdWUxNzQzOTAxMTQ= 995 xarray slicing is very slow, and reading time differs a lot between variables. fischcheng 7747527 closed 0     2 2016-08-31T22:02:42Z 2016-09-01T03:24:49Z 2016-09-01T03:17:52Z NONE      

Hi everyone,

I've been working on calculating ocean heat transport across a prescribed section. Therefore, I need to calculate the V*T (velocity times T) on the grid points along that section.

Xarray has been the my favorite tool, ever since the day it was still xray. In my code, there is a loop (looping through grid points along the line) that determine if I need to use U (zonal velocity) or V (meridional velocity). Then times it by the temperature on that location, which is full depth.

VVEL, UVEL and TEMP have structure like this (time: 1, z_t: 42, lat: 1800, lon: 3600)

ds = xarray.open_dataset(filename, decode_times=False) vvel0=ds.VVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 uvel0=ds.UVEL.sel(lat=slice(-60,-20),lon=slice(0,40))/100 temp0=ds.TEMP.sel(lat=slice(-60,-20),lon=slice(0,40))

The weird thing is, for VVEL and UVEL, it takes about 5sec with slicing, about 2 sec without slicing, but for TEMP, it only needs 6ms.

This then leads to another issue, within the loop, I need to extract a vertical column of temperature,

tt=np.squeeze(temp[:,yidx,xidx].values)

This line drags everything down, it takes about 4sec... but this is repeated for each loop (rouhgly 300 times). I found out that, removing .values reduces the time to 2ms, but I need to extract the values to calculate V*T.

What's more interesting is that, in the loop, I also have this line

vv=np.squeeze(vvel[:,yidx,xidx].values)

Which has the exact same structure as the line I used to call the column values of temperature. But this line only needs about 1ms, with and without calling .values


Could someone please explain how is this possible? All these variables are from the same netcdf file. And I called them in the exact same way. Also, why is slicing takes even longer then loading in the full field? Any suggestion I can circumvent this bottleneck? Thank you.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/995/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 84.842ms · About: xarray-datasette