home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 661953980

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1086#issuecomment-661953980 https://api.github.com/repos/pydata/xarray/issues/1086 661953980 MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA== 4992424 2020-07-21T16:09:25Z 2020-07-21T16:09:52Z NONE

Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of xarray that might be pertinent for optimization here. However, just to throw it out there, for workflows like this, it can sometimes be a bit easier to process each NetCDF file (subsetting your locations and whatnot) and convert it to CSV individually, then merge/concatenate those CSV files together at the end. This sort of workflow can be parallelized a few different ways, but is nice because you can parallelize across the number of files you need to process. A simple example based on your MRE:

``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel

dir_input = Path('.') fns = list(sorted(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc')))

Helper function to convert NetCDF to CSV with our processing

def _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe()

out_fn = fn.replace(".nc", ".csv")
df_ww3.to_csv(out_fn)

return out_fn

Using joblib.Parallel to distribute my work across whatever resources i have

out_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns )

Read the CSV files and merge them

dfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ```

YMMV but this pattern often works for many types of processing applications.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187608079
Powered by Datasette · Queries took 0.84ms · About: xarray-datasette