home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 439454213

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1385#issuecomment-439454213 https://api.github.com/repos/pydata/xarray/issues/1385 439454213 MDEyOklzc3VlQ29tbWVudDQzOTQ1NDIxMw== 1217238 2018-11-16T16:46:55Z 2018-11-16T16:46:55Z MEMBER

Does it take 10 seconds even to open a single file? The big mystery is what that top line ("_operator.getitem") is but my guess is it's netCDF4-python. h5netcdf might also give different results... On Fri, Nov 16, 2018 at 8:20 AM chuaxr notifications@github.com wrote:

Sorry, I think the speedup had to do with accessing a file that had previously been loaded rather than due to decode_cf. Here's the output of prun using two different files of approximately the same size (~75 GB), run from a notebook without using distributed (which doesn't lead to any speedup):

Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/ atmos_level.1999010100-2000123123.sphum.nc ',chunks={'lat':20,'time':50,'lon':12,'pfull':11})

      780980 function calls (780741 primitive calls) in 55.374 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     7   54.448    7.778   54.448    7.778 {built-in method _operator.getitem}
764838    0.473    0.000    0.473    0.000 core.py:169(<genexpr>)
     3    0.285    0.095    0.758    0.253 core.py:169(<listcomp>)
     2    0.041    0.020    0.041    0.020 {cftime._cftime.num2date}
     3    0.040    0.013    0.821    0.274 core.py:173(getem)
     1    0.027    0.027   55.374   55.374 <string>:1(<module>)

Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/ atmos_level.2001010100-2002123123.temp.nc ',chunks={'lat':20,'time':50,'lon':12,'pfull':11}, decode_cf=False)

      772212 function calls (772026 primitive calls) in 56.000 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5   55.213   11.043   55.214   11.043 {built-in method _operator.getitem}
764838    0.486    0.000    0.486    0.000 core.py:169(<genexpr>)
     3    0.185    0.062    0.671    0.224 core.py:169(<listcomp>)
     3    0.041    0.014    0.735    0.245 core.py:173(getem)
     1    0.027    0.027   56.001   56.001 <string>:1(<module>)

/work isn't a remote archive, so it surprises me that this should happen.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1385#issuecomment-439445695, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1jmFqfe9_dIgHAMYlVOh7WKhzO8Kks5uvuXKgaJpZM4NJOcQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  224553135
Powered by Datasette · Queries took 0.609ms · About: xarray-datasette