html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2946#issuecomment-490421774,https://api.github.com/repos/pydata/xarray/issues/2946,490421774,MDEyOklzc3VlQ29tbWVudDQ5MDQyMTc3NA==,10809480,2019-05-08T09:44:25Z,2019-05-08T09:49:02Z,NONE,"interesting fact i just learned. when you have to process over a huge dataset, first export it as a complete single netcdf file, then calculate its aggregation function. Its a workaround, i suppose bottleneck or dask needs to have its complete set first. For mean it just simply works because of the easy calculation method, for std i think dask or bottleneck assume a nan as a zero for calculation purposes. ```python data = xr.open_mfdataset(list_to_input_files, parallel=True, concat_dim=""time"") (...) data.to_netcdf(""help_netcdf_file.nc"") data.close() data = xr.open_dataset(""help_netcdf_file.nc"") data.mean(...).to_netcdf(""mean_netcdf_file.nc"") data.std(...).to_netcdf(""mean_netcdf_file.nc"") ``` It could be problematic by huuuuge datasets in the tb size. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,441222339 https://github.com/pydata/xarray/issues/2946#issuecomment-490394601,https://api.github.com/repos/pydata/xarray/issues/2946,490394601,MDEyOklzc3VlQ29tbWVudDQ5MDM5NDYwMQ==,10809480,2019-05-08T08:18:21Z,2019-05-08T09:01:56Z,NONE,"fixed: synthetic dataset of the polar region -60 - -90, in the mean calculation everything is proper and nans are ignored. std still looks suspicious. ```python import xarray as xr import glob import numpy as np data = xr.open_dataset(r""test.nc"") data.mean(dim=""time"", skipna=True).to_netcdf(r""mean_test.nc"") ``` ```python-traceback C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\numpy_compat.py:28: RuntimeWarning: invalid value encountered in true_divide x = np.divide(x1, x2, out) ``` ```python data.std(dim=""time"", skipna=True,ddof=1).astype(np.float64).to_netcdf(r""std_test.nc"") ``` ```python-traceback C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\reductions.py:386: RuntimeWarning: invalid value encountered in true_divide u = total / n ``` Dropbox to files: https://www.dropbox.com/sh/yuf114u143mj2l3/AABuQfC5wu4nrWDH4GsGgFyJa?dl=0 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,441222339 https://github.com/pydata/xarray/issues/2417#issuecomment-460325261,https://api.github.com/repos/pydata/xarray/issues/2417,460325261,MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ==,10809480,2019-02-04T16:57:27Z,2019-02-04T20:07:09Z,NONE,"hi, my testcode is running properly on 5 threads thanks for the help ```python import xarray as xr import os import numpy import sys import dask from multiprocessing.pool import ThreadPool #dask-worker = --nthreads 1 with dask.config.set(schedular='threads', pool=ThreadPool(5)): dset = xr.open_mfdataset(""/data/Environmental_Data/Sea_Surface_Height/*/*.nc"", engine='netcdf4', concat_dim='time', chunks={""latitude"":180,""longitude"":360}) dset1 = dset[""adt""]-dset[""sla""] dset1.to_dataset(name = 'ssh_mean') dset[""ssh_mean""] = dset1 dset = dset.drop(""crs"") dset = dset.drop(""lat_bnds"") dset = dset.drop(""lon_bnds"") dset = dset.drop(""__xarray_dataarray_variable__"") dset = dset.drop(""nv"") dset_all_over_monthly_mean = dset.groupby(""time.month"").mean(dim=""time"", skipna=True) dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3]) dset_all_over_season1_mean.mean(dim=""month"",skipna=True) dset_all_over_season1_mean.to_netcdf(""/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc"") ```","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974 https://github.com/pydata/xarray/issues/2417#issuecomment-460292772,https://api.github.com/repos/pydata/xarray/issues/2417,460292772,MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg==,10809480,2019-02-04T15:34:04Z,2019-02-04T15:34:04Z,NONE,"i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974