home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and issue = 567678992 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • Chrismarsh 1
  • tsupinie 1
  • lvankampenhout 1

issue 1

  • to_netcdf() doesn't work with multiprocessing scheduler · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
776160305 https://github.com/pydata/xarray/issues/3781#issuecomment-776160305 https://api.github.com/repos/pydata/xarray/issues/3781 MDEyOklzc3VlQ29tbWVudDc3NjE2MDMwNQ== tsupinie 885575 2021-02-09T18:51:13Z 2021-02-09T18:51:13Z NONE

@lvankampenhout, I ran into your problem. OP's seems like it's actually in to_netcdf(), but I think yours (ours) is in Dask's lazy loading and therefore unrelated.

In short, ds will have some Dask arrays whose contents don't actually get loaded until you call to_netcdf(). By default, Dask loads in parallel, and the default Dask parallel scheduler chokes when you do your own parallelism on top. In my case, I was able to get around it by doing

python ds.load(scheduler='sync')

at some point. If it's outside do_work(), I think you can skip the scheduler='sync' part, but inside do_work(), it's required. This bypasses the parallelism in Dask, which is probably what you want if you're doing your own parallelism.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992
713165400 https://github.com/pydata/xarray/issues/3781#issuecomment-713165400 https://api.github.com/repos/pydata/xarray/issues/3781 MDEyOklzc3VlQ29tbWVudDcxMzE2NTQwMA== Chrismarsh 630436 2020-10-20T22:01:24Z 2020-10-20T22:01:24Z NONE

I am also hitting the problem as described by @bcbnz

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992
702348129 https://github.com/pydata/xarray/issues/3781#issuecomment-702348129 https://api.github.com/repos/pydata/xarray/issues/3781 MDEyOklzc3VlQ29tbWVudDcwMjM0ODEyOQ== lvankampenhout 7933853 2020-10-01T19:24:48Z 2020-10-01T20:00:27Z NONE

I think I ran into a similar problem when combining dask-chunked DataSets (originating from open_mfdataset) with Python's native multiprocessing package. I get no error message, and the headers of the files are created, but then the script hangs indefinitely. The use case is: combining and resampling of variables into ~1000 different NetCDF files, which I want to distribute over different processes using multiprocessing.

MCVE Code Sample ```python import xarray as xr from multiprocessing import Pool import os

if (False): """ Load data without using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc") else: """ Load data using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc", chunks={})

print(ds.nbytes / 1e6, 'MB')

print('chunks', ds.air.chunks) # chunks is empty without dask

outdir = '/glade/scratch/lvank' # change this to some temporary directory on your system

def do_work(n): print(n) ds.to_netcdf(os.path.join(outdir, f'{n}.nc'))

tasks = range(10)

with Pool(processes=2) as pool: pool.map(do_work, tasks)

print('done') ```

Expected Output The NetCDF copies in outdir named 0.nc to 9.nc should be created for both cases (with and without Dask).

Problem Description In the case with Dask, when the if-statement evaluates to False, the files are not created and the program hangs.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.16.1 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.27.0 distributed: 2.28.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200925 pip: 20.2.2 conda: None pytest: None IPython: 7.18.1 sphinx: None ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.14ms · About: xarray-datasette