home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 422445732

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2417#issuecomment-422445732 https://api.github.com/repos/pydata/xarray/issues/2417 422445732 MDEyOklzc3VlQ29tbWVudDQyMjQ0NTczMg== 10819524 2018-09-18T15:44:03Z 2018-09-18T15:44:03Z CONTRIBUTOR

As per your suggestion, I retried with chunking and found a new error (due to the nature of my data having rotated poles, dask demanded that I save my data with astype(); this isn't my major concern so I'll deal with that somewhere else).

What I did notice was that when chunking was specified (ds = xr.open_dataset(ncfile).chunking({'time': 10})), I lost all parallelism and although I had specified different thread counts, the performance never crossed 110% (I imagine the extra 10% was due to I/O).

This is really a mystery and unfortunately, I haven't a clue how this beahviour is possible if parallel processing is disabled by default. The speed of my results when dask multprocessing isn't specified suggests that it must be using more processing power:

  • using Multiprocessing calls to CDO with 5 ForkPoolWorkers = ~2h/5 files (100% x 5 CPUs)
  • xarray without dask multiprocessing specifications = ~3min/5 files (spikes of 3500% on one CPU)

Could these spikes in CPU usage be due to other processes (e.g. memory usage, I/O)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  361016974
Powered by Datasette · Queries took 81.984ms · About: xarray-datasette