home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 120681918 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • JoyMonteiro 3
  • shoyer 2

author_association 2

  • NONE 3
  • MEMBER 2

issue 1

  • Making xray use multiple cores · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
162463237 https://github.com/pydata/xarray/issues/672#issuecomment-162463237 https://api.github.com/repos/pydata/xarray/issues/672 MDEyOklzc3VlQ29tbWVudDE2MjQ2MzIzNw== JoyMonteiro 7300413 2015-12-07T09:33:17Z 2015-12-07T09:33:17Z NONE

You were right, my chunk sizes were too large. It did not matter how many threads dask used either (4 vs. 8). The I/O component is still high, but that is also because I'm writing the final computed DataArray to disk.

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xray use multiple cores 120681918
162426520 https://github.com/pydata/xarray/issues/672#issuecomment-162426520 https://api.github.com/repos/pydata/xarray/issues/672 MDEyOklzc3VlQ29tbWVudDE2MjQyNjUyMA== shoyer 1217238 2015-12-07T06:46:40Z 2015-12-07T06:46:40Z MEMBER

Those sorts of operations should be easily parallelized, although depending on what you're doing with the data they might also be IO bound. It's worth experimenting with chunk sizes. For control on the number of threads, see his page: http://dask.pydata.org/en/latest/scheduler-overview.html#configuring-the-schedulers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xray use multiple cores 120681918
162419595 https://github.com/pydata/xarray/issues/672#issuecomment-162419595 https://api.github.com/repos/pydata/xarray/issues/672 MDEyOklzc3VlQ29tbWVudDE2MjQxOTU5NQ== JoyMonteiro 7300413 2015-12-07T06:11:39Z 2015-12-07T06:11:39Z NONE

Hello, I ran it with the dask profiler, and I looked at the top output disaggregated by core. It does seem to use multiple cores, but it seems to be using 8 threads when I looked at prof.visualize() (hyperthreading :P) and I feel this is killing performance.

How can I control how many threads to use?

Thanks, Joy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xray use multiple cores 120681918
162417283 https://github.com/pydata/xarray/issues/672#issuecomment-162417283 https://api.github.com/repos/pydata/xarray/issues/672 MDEyOklzc3VlQ29tbWVudDE2MjQxNzI4Mw== JoyMonteiro 7300413 2015-12-07T05:46:15Z 2015-12-07T05:46:15Z NONE

I was trying to read ERA-Interim data, calculate anomalies using ds = ds - ds.mean(dim='longitude'), and similar operations along the time axis. Are such operations restricted to single cores?

Just multiplying two datasets (u*v) seems to be faster, though top shows two cores being used (I have 4 physical cores).

TIA, Joy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xray use multiple cores 120681918
162416658 https://github.com/pydata/xarray/issues/672#issuecomment-162416658 https://api.github.com/repos/pydata/xarray/issues/672 MDEyOklzc3VlQ29tbWVudDE2MjQxNjY1OA== shoyer 1217238 2015-12-07T05:38:39Z 2015-12-07T05:38:39Z MEMBER

What sort of computation are you doing? Some tasks are limited to a single core, notably reading netCDF4 files with in-file compression. Dask's profiler may be helpful here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xray use multiple cores 120681918

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 401.859ms · About: xarray-datasette