home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 431439592

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2494#issuecomment-431439592 https://api.github.com/repos/pydata/xarray/issues/2494 431439592 MDEyOklzc3VlQ29tbWVudDQzMTQzOTU5Mg== 2443309 2018-10-19T17:34:25Z 2018-10-19T17:34:25Z MEMBER

To clear a few things up, the parallel option in netCDF4.Dataset is not the same as the parallel option in xarray.opne_mfdataset. In xarray, that option is meant to help speed up the time it takes to open many files at once. If you are using dask distributed, this should be done using that scheduler.

If you are only seeing thread parallelism in the open_mfdataset(..., parallel=True) call, I would start by looking at your dask distributed setup.

Can you try this workflow with and without the parallel option and report back:

Python client = Client(...) ds = xr.open_mfdataset(myfiles_path, concat_dim='t', engine='h5netcdf', paralel=...) x = ds['x'].load().data y = ds['y'].load().data ds.close()

Provided that you are setting up distributed to use multiple processes, you should get parallelism from multiple processes in this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  371906566
Powered by Datasette · Queries took 0.88ms · About: xarray-datasette