html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2417#issuecomment-462422387,https://api.github.com/repos/pydata/xarray/issues/2417,462422387,MDEyOklzc3VlQ29tbWVudDQ2MjQyMjM4Nw==,10819524,2019-02-11T17:41:47Z,2019-02-11T17:41:47Z,CONTRIBUTOR,"Hi @jhamman, please excuse the lateness of this reply. It turned out that in the end all I needed to do was set `OMP_NUM_THREADS` to the number based on my cores I want to use (2 threads/core) before launching my processes. Thanks for the help and for keeping this open. Feel free to close this thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460393715,https://api.github.com/repos/pydata/xarray/issues/2417,460393715,MDEyOklzc3VlQ29tbWVudDQ2MDM5MzcxNQ==,2443309,2019-02-04T20:07:56Z,2019-02-04T20:07:56Z,MEMBER,"@Zeitsperre - are you still having problems in this area? If not, is okay if we close this issue?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460325261,https://api.github.com/repos/pydata/xarray/issues/2417,460325261,MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ==,10809480,2019-02-04T16:57:27Z,2019-02-04T20:07:09Z,NONE,"hi, my testcode is running properly on 5 threads
thanks for the help

```python
import xarray as xr
import os
import numpy
import sys
import dask
from multiprocessing.pool import ThreadPool 

#dask-worker = --nthreads 1

with dask.config.set(schedular='threads', pool=ThreadPool(5)):
	dset = xr.open_mfdataset(""/data/Environmental_Data/Sea_Surface_Height/*/*.nc"", engine='netcdf4', concat_dim='time', chunks={""latitude"":180,""longitude"":360})
	dset1 = dset[""adt""]-dset[""sla""]
	dset1.to_dataset(name = 'ssh_mean')
	dset[""ssh_mean""] = dset1
	dset = dset.drop(""crs"")
	dset = dset.drop(""lat_bnds"")
	dset = dset.drop(""lon_bnds"")
	dset = dset.drop(""__xarray_dataarray_variable__"")
	dset = dset.drop(""nv"")
	dset_all_over_monthly_mean = dset.groupby(""time.month"").mean(dim=""time"", skipna=True)
	dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3])
	dset_all_over_season1_mean.mean(dim=""month"",skipna=True)
	dset_all_over_season1_mean.to_netcdf(""/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc"")
```","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460298993,https://api.github.com/repos/pydata/xarray/issues/2417,460298993,MDEyOklzc3VlQ29tbWVudDQ2MDI5ODk5Mw==,2443309,2019-02-04T15:50:09Z,2019-02-04T15:51:43Z,MEMBER,"On a few systems, I've noticed that I need to set the environment variable `OMP_NUM_THREADS` to `1` to limit parallel evaluation within dask threads. I wonder if that something like this is happening here?

xref: https://stackoverflow.com/questions/39422092/error-with-omp-num-threads-when-using-dask-distributed","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460292772,https://api.github.com/repos/pydata/xarray/issues/2417,460292772,MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg==,10809480,2019-02-04T15:34:04Z,2019-02-04T15:34:04Z,NONE,"i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460020879,https://api.github.com/repos/pydata/xarray/issues/2417,460020879,MDEyOklzc3VlQ29tbWVudDQ2MDAyMDg3OQ==,2443309,2019-02-03T03:54:59Z,2019-02-03T03:54:59Z,MEMBER,@Zeitsperre - this issue has been inactive for a while. Did you find a solution to y our problem? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-422461245,https://api.github.com/repos/pydata/xarray/issues/2417,422461245,MDEyOklzc3VlQ29tbWVudDQyMjQ2MTI0NQ==,1217238,2018-09-18T16:31:03Z,2018-09-18T16:31:03Z,MEMBER,"If your data using in-file HDF5 chunks/compression it's *possible* that HDF5 is  uncompressing the data is parallel, though I haven't seen that before personally.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-422445732,https://api.github.com/repos/pydata/xarray/issues/2417,422445732,MDEyOklzc3VlQ29tbWVudDQyMjQ0NTczMg==,10819524,2018-09-18T15:44:03Z,2018-09-18T15:44:03Z,CONTRIBUTOR,"As per your suggestion, I retried with chunking and found a new error (due to the nature of my data having rotated poles, dask demanded that I save my data with astype(); this isn't my major concern so I'll deal with that somewhere else).

What I did notice was that when chunking was specified (`ds = xr.open_dataset(ncfile).chunking({'time': 10})`), I lost all parallelism and although I had specified different thread counts, the performance never crossed 110% (I imagine the extra 10% was due to I/O).

This is really a mystery and unfortunately, I haven't a clue how this beahviour is possible if parallel processing is disabled by default. The speed of my results when dask multprocessing isn't specified suggests that it must be using more processing power:

- using Multiprocessing calls to CDO with 5 ForkPoolWorkers = ~2h/5 files (100% x 5 CPUs)
- xarray without dask multiprocessing specifications = ~3min/5 files (spikes of 3500% on one CPU)

Could these spikes in CPU usage be due to other processes (e.g. memory usage, I/O)? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-422206083,https://api.github.com/repos/pydata/xarray/issues/2417,422206083,MDEyOklzc3VlQ29tbWVudDQyMjIwNjA4Mw==,1217238,2018-09-17T23:40:52Z,2018-09-17T23:40:52Z,MEMBER,"Step 1 would be making sure that you're actually using dask :). Xarray only uses dask with `open_dataset()` if you supply the `chunks` keyword argument.

That said, xarray's only built-in support for parallelism is through Dask, so I'm not sure what is using all your CPU.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974