html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2946#issuecomment-490421774,https://api.github.com/repos/pydata/xarray/issues/2946,490421774,MDEyOklzc3VlQ29tbWVudDQ5MDQyMTc3NA==,10809480,2019-05-08T09:44:25Z,2019-05-08T09:49:02Z,NONE,"interesting fact i just learned.
when you have to process over a huge dataset, first export it as a complete single netcdf file, then calculate its aggregation function.
Its a workaround, i suppose bottleneck or dask needs to have its complete set first. For mean it just simply works because of the easy calculation method, for std i think dask or bottleneck assume a nan as a zero for calculation purposes.
```python
data = xr.open_mfdataset(list_to_input_files, parallel=True, concat_dim=""time"")
(...)
data.to_netcdf(""help_netcdf_file.nc"")
data.close()
data = xr.open_dataset(""help_netcdf_file.nc"")
data.mean(...).to_netcdf(""mean_netcdf_file.nc"")
data.std(...).to_netcdf(""mean_netcdf_file.nc"")
```
It could be problematic by huuuuge datasets in the tb size.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,441222339
https://github.com/pydata/xarray/issues/2946#issuecomment-490394601,https://api.github.com/repos/pydata/xarray/issues/2946,490394601,MDEyOklzc3VlQ29tbWVudDQ5MDM5NDYwMQ==,10809480,2019-05-08T08:18:21Z,2019-05-08T09:01:56Z,NONE,"fixed:
synthetic dataset of the polar region -60 - -90, in the mean calculation everything is proper and nans are ignored. std still looks suspicious.
```python
import xarray as xr
import glob
import numpy as np
data = xr.open_dataset(r""test.nc"")
data.mean(dim=""time"", skipna=True).to_netcdf(r""mean_test.nc"")
```
```python-traceback
C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\numpy_compat.py:28: RuntimeWarning: invalid value encountered in true_divide
x = np.divide(x1, x2, out)
```
```python
data.std(dim=""time"", skipna=True,ddof=1).astype(np.float64).to_netcdf(r""std_test.nc"")
```
```python-traceback
C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\reductions.py:386: RuntimeWarning: invalid value encountered in true_divide
u = total / n
```
Dropbox to files:
https://www.dropbox.com/sh/yuf114u143mj2l3/AABuQfC5wu4nrWDH4GsGgFyJa?dl=0
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,441222339
https://github.com/pydata/xarray/issues/2417#issuecomment-460325261,https://api.github.com/repos/pydata/xarray/issues/2417,460325261,MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ==,10809480,2019-02-04T16:57:27Z,2019-02-04T20:07:09Z,NONE,"hi, my testcode is running properly on 5 threads
thanks for the help
```python
import xarray as xr
import os
import numpy
import sys
import dask
from multiprocessing.pool import ThreadPool
#dask-worker = --nthreads 1
with dask.config.set(schedular='threads', pool=ThreadPool(5)):
dset = xr.open_mfdataset(""/data/Environmental_Data/Sea_Surface_Height/*/*.nc"", engine='netcdf4', concat_dim='time', chunks={""latitude"":180,""longitude"":360})
dset1 = dset[""adt""]-dset[""sla""]
dset1.to_dataset(name = 'ssh_mean')
dset[""ssh_mean""] = dset1
dset = dset.drop(""crs"")
dset = dset.drop(""lat_bnds"")
dset = dset.drop(""lon_bnds"")
dset = dset.drop(""__xarray_dataarray_variable__"")
dset = dset.drop(""nv"")
dset_all_over_monthly_mean = dset.groupby(""time.month"").mean(dim=""time"", skipna=True)
dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3])
dset_all_over_season1_mean.mean(dim=""month"",skipna=True)
dset_all_over_season1_mean.to_netcdf(""/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc"")
```","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974
https://github.com/pydata/xarray/issues/2417#issuecomment-460292772,https://api.github.com/repos/pydata/xarray/issues/2417,460292772,MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg==,10809480,2019-02-04T15:34:04Z,2019-02-04T15:34:04Z,NONE,"i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,361016974