home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 328724595

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1279#issuecomment-328724595 https://api.github.com/repos/pydata/xarray/issues/1279 328724595 MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ== 4992424 2017-09-12T03:29:29Z 2017-09-12T03:29:29Z NONE

@shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose.

@jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation.

Is there anyway this could be pushed before 0.10.0? It's a killer enhancement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  208903781
Powered by Datasette · Queries took 0.738ms · About: xarray-datasette