html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2494#issuecomment-433393215,https://api.github.com/repos/pydata/xarray/issues/2494,433393215,MDEyOklzc3VlQ29tbWVudDQzMzM5MzIxNQ==,32069530,2018-10-26T12:37:30Z,2018-10-26T12:37:30Z,NONE,"Hi all, I finally figured out my problem. On each independent process xr.open_mfdataset() seems to naturally try to do some multi-threaded access (even without parallel option ?). Each node of my cluster was configured in such a way that multi-threading was possible (my mistake). Here was my yaml config file used by PBSCluster() ``` jobqueue: pbs: name: dask-worker # Dask worker options cores: 56 processes: 28 ``` I tough that the parallel=True option was to enable parallelized access for my independent process. It actually enable parallelized access for possible threads of each process. Now, I have removed parallel=True from xr.open_mfdataset() call and ensure 1 thread by process by changing my config file: ``` cores: 28 processes: 28 ``` Thanks again for your help ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,371906566 https://github.com/pydata/xarray/issues/2494#issuecomment-431796693,https://api.github.com/repos/pydata/xarray/issues/2494,431796693,MDEyOklzc3VlQ29tbWVudDQzMTc5NjY5Mw==,32069530,2018-10-22T10:27:04Z,2018-10-22T10:27:04Z,NONE,"@jhamman I was aware of the difference between the two parallel options. I was thus wondering if I could pass a parallel option to the netcdf4 library via the open_mfdataset() call. I tried to change the engine to netcdf4 and added the backend_kwarg : `backend_kwargs={'parallel':True}` but I get the same error. I 'll try the suggestion of Stephan to see how it behaves and I will report back. Thanks","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,371906566 https://github.com/pydata/xarray/issues/2494#issuecomment-431439592,https://api.github.com/repos/pydata/xarray/issues/2494,431439592,MDEyOklzc3VlQ29tbWVudDQzMTQzOTU5Mg==,2443309,2018-10-19T17:34:25Z,2018-10-19T17:34:25Z,MEMBER,"To clear a few things up, the parallel option in `netCDF4.Dataset` is not the same as the parallel option in `xarray.opne_mfdataset`. In xarray, that option is meant to help speed up the time it takes to open *many* files at once. If you are using dask distributed, this should be done using that scheduler. If you are only seeing thread parallelism in the `open_mfdataset(..., parallel=True)` call, I would start by looking at your dask distributed setup. Can you try this workflow with and without the parallel option and report back: ```Python client = Client(...) ds = xr.open_mfdataset(myfiles_path, concat_dim='t', engine='h5netcdf', paralel=...) x = ds['x'].load().data y = ds['y'].load().data ds.close() ``` Provided that you are setting up distributed to use multiple processes, you should get parallelism from multiple processes in this case. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,371906566 https://github.com/pydata/xarray/issues/2494#issuecomment-431435999,https://api.github.com/repos/pydata/xarray/issues/2494,431435999,MDEyOklzc3VlQ29tbWVudDQzMTQzNTk5OQ==,1217238,2018-10-19T17:21:45Z,2018-10-19T17:21:45Z,MEMBER,"This *may* be fixed if you try the development version of xarray -- we did a major refactor of xarray's handling of netCDF files, e.g., try `pip install https://github.com/pydata/xarray/archive/master.zip`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,371906566