home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 371906566 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • lanougue 2
  • shoyer 1
  • jhamman 1

author_association 2

  • MEMBER 2
  • NONE 2

issue 1

  • Concurrent acces with multiple processes using open_mfdataset · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
433393215 https://github.com/pydata/xarray/issues/2494#issuecomment-433393215 https://api.github.com/repos/pydata/xarray/issues/2494 MDEyOklzc3VlQ29tbWVudDQzMzM5MzIxNQ== lanougue 32069530 2018-10-26T12:37:30Z 2018-10-26T12:37:30Z NONE

Hi all, I finally figured out my problem. On each independent process xr.open_mfdataset() seems to naturally try to do some multi-threaded access (even without parallel option ?). Each node of my cluster was configured in such a way that multi-threading was possible (my mistake). Here was my yaml config file used by PBSCluster() jobqueue: pbs: name: dask-worker # Dask worker options cores: 56 processes: 28 I tough that the parallel=True option was to enable parallelized access for my independent process. It actually enable parallelized access for possible threads of each process. Now, I have removed parallel=True from xr.open_mfdataset() call and ensure 1 thread by process by changing my config file: cores: 28 processes: 28 Thanks again for your help

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concurrent acces with multiple processes using open_mfdataset 371906566
431796693 https://github.com/pydata/xarray/issues/2494#issuecomment-431796693 https://api.github.com/repos/pydata/xarray/issues/2494 MDEyOklzc3VlQ29tbWVudDQzMTc5NjY5Mw== lanougue 32069530 2018-10-22T10:27:04Z 2018-10-22T10:27:04Z NONE

@jhamman I was aware of the difference between the two parallel options. I was thus wondering if I could pass a parallel option to the netcdf4 library via the open_mfdataset() call. I tried to change the engine to netcdf4 and added the backend_kwarg : backend_kwargs={'parallel':True} but I get the same error. I 'll try the suggestion of Stephan to see how it behaves and I will report back. Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concurrent acces with multiple processes using open_mfdataset 371906566
431439592 https://github.com/pydata/xarray/issues/2494#issuecomment-431439592 https://api.github.com/repos/pydata/xarray/issues/2494 MDEyOklzc3VlQ29tbWVudDQzMTQzOTU5Mg== jhamman 2443309 2018-10-19T17:34:25Z 2018-10-19T17:34:25Z MEMBER

To clear a few things up, the parallel option in netCDF4.Dataset is not the same as the parallel option in xarray.opne_mfdataset. In xarray, that option is meant to help speed up the time it takes to open many files at once. If you are using dask distributed, this should be done using that scheduler.

If you are only seeing thread parallelism in the open_mfdataset(..., parallel=True) call, I would start by looking at your dask distributed setup.

Can you try this workflow with and without the parallel option and report back:

Python client = Client(...) ds = xr.open_mfdataset(myfiles_path, concat_dim='t', engine='h5netcdf', paralel=...) x = ds['x'].load().data y = ds['y'].load().data ds.close()

Provided that you are setting up distributed to use multiple processes, you should get parallelism from multiple processes in this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concurrent acces with multiple processes using open_mfdataset 371906566
431435999 https://github.com/pydata/xarray/issues/2494#issuecomment-431435999 https://api.github.com/repos/pydata/xarray/issues/2494 MDEyOklzc3VlQ29tbWVudDQzMTQzNTk5OQ== shoyer 1217238 2018-10-19T17:21:45Z 2018-10-19T17:21:45Z MEMBER

This may be fixed if you try the development version of xarray -- we did a major refactor of xarray's handling of netCDF files, e.g., try pip install https://github.com/pydata/xarray/archive/master.zip.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concurrent acces with multiple processes using open_mfdataset 371906566

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 642.901ms · About: xarray-datasette