home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 446933504

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
446933504 MDU6SXNzdWU0NDY5MzM1MDQ= 2979 Reading single grid cells from a multi-file netcdf dataset? 167164 open 0     1 2019-05-22T05:01:50Z 2019-05-23T16:15:54Z   NONE      

I have a multifile dataset made up of month-long 8-hourly netcdf datasets over nearly 30 years. The files are available from ftp://ftp.ifremer.fr/ifremer/ww3/HINDCAST/GLOBAL/, and I'm spcifically looking at e.g. 1990_CFSR/hs/ww3.199001_hs.nc for each year and month. Each file is about 45Mb, for about 15Gb total.

I want to calculate some lognormal distribution parameters of the Hs variable at each grid point (actually, only a smallish subset of points, using a mask). However, if I load the data with open_mfdataset and try to read a single lat/lon grid cell, my computer tanks, and python gets killed due to running out of memory (I have 16Gb, but even if I only try to open 1 year of data - ~500Mb, python ends up using 27% of my memory).

Is there a way in xarray/dask to force dask to only read single sub-arrays at a time? I have tried using lat/lon chunking, e.g.

python mfdata_glob = '/home/nedcr/cr/data/wave/*1990*.nc' global_ds = xr.open_mfdataset( mfdata_glob, chunks={'latitude': 1, 'longitude': 1}) but that doesn't seem to improve things.

Is there any way around this problem? I guess I could try using preprocess= to sub-select grid cells, and loop over that, but that seems like it would require opening and reading each file 317*720 times, which sounds like a recipe for a long wait.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 480.441ms · About: xarray-datasette