home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 361016974 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • jhamman 3
  • shoyer 2
  • andytraumueller 2
  • Zeitsperre 2

author_association 3

  • MEMBER 5
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Limiting threads/cores used by xarray(/dask?) · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
462422387 https://github.com/pydata/xarray/issues/2417#issuecomment-462422387 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MjQyMjM4Nw== Zeitsperre 10819524 2019-02-11T17:41:47Z 2019-02-11T17:41:47Z CONTRIBUTOR

Hi @jhamman, please excuse the lateness of this reply. It turned out that in the end all I needed to do was set OMP_NUM_THREADS to the number based on my cores I want to use (2 threads/core) before launching my processes. Thanks for the help and for keeping this open. Feel free to close this thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460393715 https://github.com/pydata/xarray/issues/2417#issuecomment-460393715 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDM5MzcxNQ== jhamman 2443309 2019-02-04T20:07:56Z 2019-02-04T20:07:56Z MEMBER

@Zeitsperre - are you still having problems in this area? If not, is okay if we close this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460325261 https://github.com/pydata/xarray/issues/2417#issuecomment-460325261 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ== andytraumueller 10809480 2019-02-04T16:57:27Z 2019-02-04T20:07:09Z NONE

hi, my testcode is running properly on 5 threads thanks for the help

```python import xarray as xr import os import numpy import sys import dask from multiprocessing.pool import ThreadPool

dask-worker = --nthreads 1

with dask.config.set(schedular='threads', pool=ThreadPool(5)): dset = xr.open_mfdataset("/data/Environmental_Data/Sea_Surface_Height//.nc", engine='netcdf4', concat_dim='time', chunks={"latitude":180,"longitude":360}) dset1 = dset["adt"]-dset["sla"] dset1.to_dataset(name = 'ssh_mean') dset["ssh_mean"] = dset1 dset = dset.drop("crs") dset = dset.drop("lat_bnds") dset = dset.drop("lon_bnds") dset = dset.drop("xarray_dataarray_variable") dset = dset.drop("nv") dset_all_over_monthly_mean = dset.groupby("time.month").mean(dim="time", skipna=True) dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3]) dset_all_over_season1_mean.mean(dim="month",skipna=True) dset_all_over_season1_mean.to_netcdf("/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc") ```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460298993 https://github.com/pydata/xarray/issues/2417#issuecomment-460298993 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDI5ODk5Mw== jhamman 2443309 2019-02-04T15:50:09Z 2019-02-04T15:51:43Z MEMBER

On a few systems, I've noticed that I need to set the environment variable OMP_NUM_THREADS to 1 to limit parallel evaluation within dask threads. I wonder if that something like this is happening here?

xref: https://stackoverflow.com/questions/39422092/error-with-omp-num-threads-when-using-dask-distributed

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460292772 https://github.com/pydata/xarray/issues/2417#issuecomment-460292772 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg== andytraumueller 10809480 2019-02-04T15:34:04Z 2019-02-04T15:34:04Z NONE

i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460020879 https://github.com/pydata/xarray/issues/2417#issuecomment-460020879 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDAyMDg3OQ== jhamman 2443309 2019-02-03T03:54:59Z 2019-02-03T03:54:59Z MEMBER

@Zeitsperre - this issue has been inactive for a while. Did you find a solution to y our problem?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
422461245 https://github.com/pydata/xarray/issues/2417#issuecomment-422461245 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQyMjQ2MTI0NQ== shoyer 1217238 2018-09-18T16:31:03Z 2018-09-18T16:31:03Z MEMBER

If your data using in-file HDF5 chunks/compression it's possible that HDF5 is uncompressing the data is parallel, though I haven't seen that before personally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
422445732 https://github.com/pydata/xarray/issues/2417#issuecomment-422445732 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQyMjQ0NTczMg== Zeitsperre 10819524 2018-09-18T15:44:03Z 2018-09-18T15:44:03Z CONTRIBUTOR

As per your suggestion, I retried with chunking and found a new error (due to the nature of my data having rotated poles, dask demanded that I save my data with astype(); this isn't my major concern so I'll deal with that somewhere else).

What I did notice was that when chunking was specified (ds = xr.open_dataset(ncfile).chunking({'time': 10})), I lost all parallelism and although I had specified different thread counts, the performance never crossed 110% (I imagine the extra 10% was due to I/O).

This is really a mystery and unfortunately, I haven't a clue how this beahviour is possible if parallel processing is disabled by default. The speed of my results when dask multprocessing isn't specified suggests that it must be using more processing power:

  • using Multiprocessing calls to CDO with 5 ForkPoolWorkers = ~2h/5 files (100% x 5 CPUs)
  • xarray without dask multiprocessing specifications = ~3min/5 files (spikes of 3500% on one CPU)

Could these spikes in CPU usage be due to other processes (e.g. memory usage, I/O)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
422206083 https://github.com/pydata/xarray/issues/2417#issuecomment-422206083 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQyMjIwNjA4Mw== shoyer 1217238 2018-09-17T23:40:52Z 2018-09-17T23:40:52Z MEMBER

Step 1 would be making sure that you're actually using dask :). Xarray only uses dask with open_dataset() if you supply the chunks keyword argument.

That said, xarray's only built-in support for parallelism is through Dask, so I'm not sure what is using all your CPU.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3280.418ms · About: xarray-datasette