issue_comments: 328724595
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/1279#issuecomment-328724595 | https://api.github.com/repos/pydata/xarray/issues/1279 | 328724595 | MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ== | 4992424 | 2017-09-12T03:29:29Z | 2017-09-12T03:29:29Z | NONE | @shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose. @jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation. Is there anyway this could be pushed before 0.10.0? It's a killer enhancement. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
208903781 |