home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 567678992 and user = 7933853 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • lvankampenhout · 1 ✖

issue 1

  • to_netcdf() doesn't work with multiprocessing scheduler · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
702348129 https://github.com/pydata/xarray/issues/3781#issuecomment-702348129 https://api.github.com/repos/pydata/xarray/issues/3781 MDEyOklzc3VlQ29tbWVudDcwMjM0ODEyOQ== lvankampenhout 7933853 2020-10-01T19:24:48Z 2020-10-01T20:00:27Z NONE

I think I ran into a similar problem when combining dask-chunked DataSets (originating from open_mfdataset) with Python's native multiprocessing package. I get no error message, and the headers of the files are created, but then the script hangs indefinitely. The use case is: combining and resampling of variables into ~1000 different NetCDF files, which I want to distribute over different processes using multiprocessing.

MCVE Code Sample ```python import xarray as xr from multiprocessing import Pool import os

if (False): """ Load data without using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc") else: """ Load data using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc", chunks={})

print(ds.nbytes / 1e6, 'MB')

print('chunks', ds.air.chunks) # chunks is empty without dask

outdir = '/glade/scratch/lvank' # change this to some temporary directory on your system

def do_work(n): print(n) ds.to_netcdf(os.path.join(outdir, f'{n}.nc'))

tasks = range(10)

with Pool(processes=2) as pool: pool.map(do_work, tasks)

print('done') ```

Expected Output The NetCDF copies in outdir named 0.nc to 9.nc should be created for both cases (with and without Dask).

Problem Description In the case with Dask, when the if-statement evaluates to False, the files are not created and the program hangs.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.16.1 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.27.0 distributed: 2.28.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200925 pip: 20.2.2 conda: None pytest: None IPython: 7.18.1 sphinx: None ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.667ms · About: xarray-datasette