home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "CONTRIBUTOR" and issue = 361016974 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • Zeitsperre 2

issue 1

  • Limiting threads/cores used by xarray(/dask?) · 2 ✖

author_association 1

  • CONTRIBUTOR · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
462422387 https://github.com/pydata/xarray/issues/2417#issuecomment-462422387 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MjQyMjM4Nw== Zeitsperre 10819524 2019-02-11T17:41:47Z 2019-02-11T17:41:47Z CONTRIBUTOR

Hi @jhamman, please excuse the lateness of this reply. It turned out that in the end all I needed to do was set OMP_NUM_THREADS to the number based on my cores I want to use (2 threads/core) before launching my processes. Thanks for the help and for keeping this open. Feel free to close this thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
422445732 https://github.com/pydata/xarray/issues/2417#issuecomment-422445732 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQyMjQ0NTczMg== Zeitsperre 10819524 2018-09-18T15:44:03Z 2018-09-18T15:44:03Z CONTRIBUTOR

As per your suggestion, I retried with chunking and found a new error (due to the nature of my data having rotated poles, dask demanded that I save my data with astype(); this isn't my major concern so I'll deal with that somewhere else).

What I did notice was that when chunking was specified (ds = xr.open_dataset(ncfile).chunking({'time': 10})), I lost all parallelism and although I had specified different thread counts, the performance never crossed 110% (I imagine the extra 10% was due to I/O).

This is really a mystery and unfortunately, I haven't a clue how this beahviour is possible if parallel processing is disabled by default. The speed of my results when dask multprocessing isn't specified suggests that it must be using more processing power:

  • using Multiprocessing calls to CDO with 5 ForkPoolWorkers = ~2h/5 files (100% x 5 CPUs)
  • xarray without dask multiprocessing specifications = ~3min/5 files (spikes of 3500% on one CPU)

Could these spikes in CPU usage be due to other processes (e.g. memory usage, I/O)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.414ms · About: xarray-datasette