html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/463#issuecomment-224049602,https://api.github.com/repos/pydata/xarray/issues/463,224049602,MDEyOklzc3VlQ29tbWVudDIyNDA0OTYwMg==,4992424,2016-06-06T18:42:06Z,2016-06-06T18:42:06Z,NONE,"@mangecoeur, although it's not an xarray-based solution, I've found that by far the best solution to this problem is to transform your dataset from the ""timeslice"" format (which is convenient for models to write out - all the data at a given point in time, often in separate files for each time step) to ""timeseries"" format - a continuous format, where you have all the data for a single variable in a single (or much smaller collection of) files.
NCAR published a great utility for converting batches of NetCDF output from timeslice to timeseries format [here](https://github.com/NCAR/PyReshaper); it's significantly faster than any shell-script/CDO/NCO solution I've ever encountered, and it parallelizes extremely easily.
Adding a simple post-processing step to convert my simulation output to timeseries format dramatically reduced my overall work time. Before, I had a separate handler which re-implemented open_mfdataset(), performed an intermediate reduction (usually extracting a variable), and then concatenated within xarray. This could get around the open file limit, but it wasn't fast. My pre-processed data is often still big - barely fitting within memory - but it's far easier to handle, and you can throw dask at it no problem to get huge speedups in analysis.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498