home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 371906566

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
371906566 MDU6SXNzdWUzNzE5MDY1NjY= 2494 Concurrent acces with multiple processes using open_mfdataset 32069530 closed 0     4 2018-10-19T10:52:46Z 2018-10-26T12:37:30Z 2018-10-26T12:37:30Z NONE      

Hi everyone,

First: thanks to the developers for this amazing xarray library ! Great piece of work ! Here comes my troubles: I run several (about 500) independant processes (dask distributed) that need simultaneous reading (only) access to a same (group of) netcdf files. I only pass the files-path strings to the processes to avoid pickling a netcdf python-object (issue).

In each process, I run

python with xr.open_mfdataset(myfiles_path, concat_dim='t', engine='h5netcdf') as myfile: x = myfile['x'].data y = myfile['y'].data

but it leads to typical errors for many concurrent access that fail... : Invalid id or Exception: CancelledError("('mul-484a58bf5830233021e08456b45eb60d', 0, 0)",), ...

I was using netCDF4 module with parallel option set to True, when playing with a single netcdf file and it was running fine: python myfile = Dataset(seedsurf_path,'r', parallel=True) x = myfile['x'] y = myfile['y'] myfile.close() Parallel option for open_mfdataset() seems to be dedicated to multithreaded access only. Is there somthing that can be done for multi-processes access ? Thanks

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.12.1 scipy: 0.19.1 netCDF4: 1.2.4 h5netcdf: 0.6.2 h5py: 2.7.0 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.19.0 distributed: 1.23.0 matplotlib: 2.2.3 cartopy: 0.16.0 seaborn: None setuptools: 40.2.0 pip: 18.0 conda: None pytest: None IPython: 6.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2494/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.719ms · About: xarray-datasette