html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2417#issuecomment-462422387,https://api.github.com/repos/pydata/xarray/issues/2417,462422387,MDEyOklzc3VlQ29tbWVudDQ2MjQyMjM4Nw==,10819524,2019-02-11T17:41:47Z,2019-02-11T17:41:47Z,CONTRIBUTOR,"Hi @jhamman, please excuse the lateness of this reply. It turned out that in the end all I needed to do was set `OMP_NUM_THREADS` to the number based on my cores I want to use (2 threads/core) before launching my processes. Thanks for the help and for keeping this open. Feel free to close this thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460393715,https://api.github.com/repos/pydata/xarray/issues/2417,460393715,MDEyOklzc3VlQ29tbWVudDQ2MDM5MzcxNQ==,2443309,2019-02-04T20:07:56Z,2019-02-04T20:07:56Z,MEMBER,"@Zeitsperre - are you still having problems in this area? If not, is okay if we close this issue?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460325261,https://api.github.com/repos/pydata/xarray/issues/2417,460325261,MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ==,10809480,2019-02-04T16:57:27Z,2019-02-04T20:07:09Z,NONE,"hi, my testcode is running properly on 5 threads thanks for the help ```python import xarray as xr import os import numpy import sys import dask from multiprocessing.pool import ThreadPool #dask-worker = --nthreads 1 with dask.config.set(schedular='threads', pool=ThreadPool(5)): dset = xr.open_mfdataset(""/data/Environmental_Data/Sea_Surface_Height/*/*.nc"", engine='netcdf4', concat_dim='time', chunks={""latitude"":180,""longitude"":360}) dset1 = dset[""adt""]-dset[""sla""] dset1.to_dataset(name = 'ssh_mean') dset[""ssh_mean""] = dset1 dset = dset.drop(""crs"") dset = dset.drop(""lat_bnds"") dset = dset.drop(""lon_bnds"") dset = dset.drop(""__xarray_dataarray_variable__"") dset = dset.drop(""nv"") dset_all_over_monthly_mean = dset.groupby(""time.month"").mean(dim=""time"", skipna=True) dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3]) dset_all_over_season1_mean.mean(dim=""month"",skipna=True) dset_all_over_season1_mean.to_netcdf(""/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc"") ```","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460298993,https://api.github.com/repos/pydata/xarray/issues/2417,460298993,MDEyOklzc3VlQ29tbWVudDQ2MDI5ODk5Mw==,2443309,2019-02-04T15:50:09Z,2019-02-04T15:51:43Z,MEMBER,"On a few systems, I've noticed that I need to set the environment variable `OMP_NUM_THREADS` to `1` to limit parallel evaluation within dask threads. I wonder if that something like this is happening here? xref: https://stackoverflow.com/questions/39422092/error-with-omp-num-threads-when-using-dask-distributed","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460292772,https://api.github.com/repos/pydata/xarray/issues/2417,460292772,MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg==,10809480,2019-02-04T15:34:04Z,2019-02-04T15:34:04Z,NONE,"i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460020879,https://api.github.com/repos/pydata/xarray/issues/2417,460020879,MDEyOklzc3VlQ29tbWVudDQ2MDAyMDg3OQ==,2443309,2019-02-03T03:54:59Z,2019-02-03T03:54:59Z,MEMBER,@Zeitsperre - this issue has been inactive for a while. Did you find a solution to y our problem? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-422461245,https://api.github.com/repos/pydata/xarray/issues/2417,422461245,MDEyOklzc3VlQ29tbWVudDQyMjQ2MTI0NQ==,1217238,2018-09-18T16:31:03Z,2018-09-18T16:31:03Z,MEMBER,"If your data using in-file HDF5 chunks/compression it's *possible* that HDF5 is uncompressing the data is parallel, though I haven't seen that before personally.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-422445732,https://api.github.com/repos/pydata/xarray/issues/2417,422445732,MDEyOklzc3VlQ29tbWVudDQyMjQ0NTczMg==,10819524,2018-09-18T15:44:03Z,2018-09-18T15:44:03Z,CONTRIBUTOR,"As per your suggestion, I retried with chunking and found a new error (due to the nature of my data having rotated poles, dask demanded that I save my data with astype(); this isn't my major concern so I'll deal with that somewhere else). What I did notice was that when chunking was specified (`ds = xr.open_dataset(ncfile).chunking({'time': 10})`), I lost all parallelism and although I had specified different thread counts, the performance never crossed 110% (I imagine the extra 10% was due to I/O). This is really a mystery and unfortunately, I haven't a clue how this beahviour is possible if parallel processing is disabled by default. The speed of my results when dask multprocessing isn't specified suggests that it must be using more processing power: - using Multiprocessing calls to CDO with 5 ForkPoolWorkers = ~2h/5 files (100% x 5 CPUs) - xarray without dask multiprocessing specifications = ~3min/5 files (spikes of 3500% on one CPU) Could these spikes in CPU usage be due to other processes (e.g. memory usage, I/O)? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-422206083,https://api.github.com/repos/pydata/xarray/issues/2417,422206083,MDEyOklzc3VlQ29tbWVudDQyMjIwNjA4Mw==,1217238,2018-09-17T23:40:52Z,2018-09-17T23:40:52Z,MEMBER,"Step 1 would be making sure that you're actually using dask :). Xarray only uses dask with `open_dataset()` if you supply the `chunks` keyword argument. That said, xarray's only built-in support for parallelism is through Dask, so I'm not sure what is using all your CPU.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974