home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 166593563

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
166593563 MDU6SXNzdWUxNjY1OTM1NjM= 912 Speed up operations with xarray dataset 7504461 closed 0     12 2016-07-20T14:21:40Z 2016-12-29T01:07:52Z 2016-12-29T01:07:52Z NONE      

Hi all,

I've been recently having hard times to manipulate a xarray dataset. Not sure if I am making some awkward mistake, but it is taking an unacceptable amount of time to perform simple operations.

Here is a piece of my code:

ncfile = glob('*conc_size_12m.nc') ds = xray.open_dataset(ncfile[0]) ds

<xarray.Dataset> Dimensions: (burst: 2485, duration: 2400, z: 160) Coordinates: zdist (z) float64 0.01014 0.02027 0.03041 0.04054 0.05068 ... burst_nr (burst) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ... time (duration, burst) datetime64[ns] 2014-09-16T07:00:00 ... * burst (burst) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... * duration (duration) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... * z (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Data variables: conc_profs (duration, z, burst) float32 3.99138e-05 4.23636e-05 ... burst_duration (duration) float64 0.0 0.1246 0.2493 0.3739 0.4985 ... grainSize_profs (duration, z, burst) float32 200.0 200.0 200.0 200.0 ...

ds.nbytes * (2 ** -30)

7.15415246784687

%time conc_avg = ds.conc_profs.chunk(2400).mean(('z','duration'))

CPU times: user 12 ms, sys: 0 ns, total: 12 ms Wall time: 9.84 ms

%time conc_avg.load()

%time conc_avg = ds.conc_profs.isel(burst=0).mean(('z','duration'))

CPU times: user 708 ms, sys: 2.87 s, total: 3.58 s Wall time: 1min 56s

If I work with chunks, it is impossible to load back the array in a reasonable amount of time (I waited for more than 30 min).

Looping over the dimension burst, it takes about 2 minutes per loop which is also quite unreasonable.

I was wondering if the problem could stem from the creation of my dataset which I saved into this 7+GB netCDF file. Could that be the case?

I am working in a Linux Inter Core i5 which is supposed to handle these manipulations with no hicups. I use the IOOS environment to run xarray (vr '0.7.1').

Can someone provide me some advice on how to optimize my script?

I am happy to supply with more details if needed.

Cheers,

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/912/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 12 rows from issue in issue_comments
Powered by Datasette · Queries took 79.114ms · About: xarray-datasette