home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "NONE" and user = 10809480 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 2

  • Limiting threads/cores used by xarray(/dask?) 2
  • std interprets continents as zero not nan 2

user 1

  • andytraumueller · 4 ✖

author_association 1

  • NONE · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
490421774 https://github.com/pydata/xarray/issues/2946#issuecomment-490421774 https://api.github.com/repos/pydata/xarray/issues/2946 MDEyOklzc3VlQ29tbWVudDQ5MDQyMTc3NA== andytraumueller 10809480 2019-05-08T09:44:25Z 2019-05-08T09:49:02Z NONE

interesting fact i just learned. when you have to process over a huge dataset, first export it as a complete single netcdf file, then calculate its aggregation function.

Its a workaround, i suppose bottleneck or dask needs to have its complete set first. For mean it just simply works because of the easy calculation method, for std i think dask or bottleneck assume a nan as a zero for calculation purposes.

python data = xr.open_mfdataset(list_to_input_files, parallel=True, concat_dim="time") (...) data.to_netcdf("help_netcdf_file.nc") data.close() data = xr.open_dataset("help_netcdf_file.nc") data.mean(...).to_netcdf("mean_netcdf_file.nc") data.std(...).to_netcdf("mean_netcdf_file.nc")

It could be problematic by huuuuge datasets in the tb size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  std interprets continents as zero not nan 441222339
490394601 https://github.com/pydata/xarray/issues/2946#issuecomment-490394601 https://api.github.com/repos/pydata/xarray/issues/2946 MDEyOklzc3VlQ29tbWVudDQ5MDM5NDYwMQ== andytraumueller 10809480 2019-05-08T08:18:21Z 2019-05-08T09:01:56Z NONE

fixed: synthetic dataset of the polar region -60 - -90, in the mean calculation everything is proper and nans are ignored. std still looks suspicious.

```python import xarray as xr import glob import numpy as np

data = xr.open_dataset(r"test.nc") data.mean(dim="time", skipna=True).to_netcdf(r"mean_test.nc") python-traceback C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\numpy_compat.py:28: RuntimeWarning: invalid value encountered in true_divide x = np.divide(x1, x2, out) ```

python data.std(dim="time", skipna=True,ddof=1).astype(np.float64).to_netcdf(r"std_test.nc") python-traceback C:\Users\atraumue\AppData\Local\Continuum\anaconda3\lib\site-packages\dask\array\reductions.py:386: RuntimeWarning: invalid value encountered in true_divide u = total / n

Dropbox to files: https://www.dropbox.com/sh/yuf114u143mj2l3/AABuQfC5wu4nrWDH4GsGgFyJa?dl=0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  std interprets continents as zero not nan 441222339
460325261 https://github.com/pydata/xarray/issues/2417#issuecomment-460325261 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDMyNTI2MQ== andytraumueller 10809480 2019-02-04T16:57:27Z 2019-02-04T20:07:09Z NONE

hi, my testcode is running properly on 5 threads thanks for the help

```python import xarray as xr import os import numpy import sys import dask from multiprocessing.pool import ThreadPool

dask-worker = --nthreads 1

with dask.config.set(schedular='threads', pool=ThreadPool(5)): dset = xr.open_mfdataset("/data/Environmental_Data/Sea_Surface_Height//.nc", engine='netcdf4', concat_dim='time', chunks={"latitude":180,"longitude":360}) dset1 = dset["adt"]-dset["sla"] dset1.to_dataset(name = 'ssh_mean') dset["ssh_mean"] = dset1 dset = dset.drop("crs") dset = dset.drop("lat_bnds") dset = dset.drop("lon_bnds") dset = dset.drop("xarray_dataarray_variable") dset = dset.drop("nv") dset_all_over_monthly_mean = dset.groupby("time.month").mean(dim="time", skipna=True) dset_all_over_season1_mean = dset_all_over_monthly_mean.sel(month=[1,2,3]) dset_all_over_season1_mean.mean(dim="month",skipna=True) dset_all_over_season1_mean.to_netcdf("/data/Environmental_Data/dump/mean/all_over_season1_mean_ssh_copernicus_0.25deg_season1_data_mean.nc") ```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974
460292772 https://github.com/pydata/xarray/issues/2417#issuecomment-460292772 https://api.github.com/repos/pydata/xarray/issues/2417 MDEyOklzc3VlQ29tbWVudDQ2MDI5Mjc3Mg== andytraumueller 10809480 2019-02-04T15:34:04Z 2019-02-04T15:34:04Z NONE

i am also interest, I am running a lot of critical processes and I want to at least have 5 cores idleing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Limiting threads/cores used by xarray(/dask?) 361016974

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 20.038ms · About: xarray-datasette