html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1086#issuecomment-661953980,https://api.github.com/repos/pydata/xarray/issues/1086,661953980,MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA==,4992424,2020-07-21T16:09:25Z,2020-07-21T16:09:52Z,NONE,"Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of `xarray` that might be pertinent for optimization here. However, just to throw it out there, for workflows like this, it can sometimes be a bit easier to process each NetCDF file (subsetting your locations and whatnot) and convert it to CSV individually, then merge/concatenate those CSV files together at the end. This sort of workflow can be parallelized a few different ways, but is nice because you can parallelize across the number of files you need to process. A simple example based on your MRE: ``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel dir_input = Path('.') fns = list(sorted(dir_input.glob('**/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_*.nc'))) # Helper function to convert NetCDF to CSV with our processing def _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe() out_fn = fn.replace("".nc"", "".csv"") df_ww3.to_csv(out_fn) return out_fn # Using joblib.Parallel to distribute my work across whatever resources i have out_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns ) # Read the CSV files and merge them dfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ``` YMMV but this pattern often works for many types of processing applications.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187608079