home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 289342234 and user = 2443309 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • jhamman · 1 ✖

issue 1

  • HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler · 1 ✖

author_association 1

  • MEMBER 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
361466652 https://github.com/pydata/xarray/issues/1836#issuecomment-361466652 https://api.github.com/repos/pydata/xarray/issues/1836 MDEyOklzc3VlQ29tbWVudDM2MTQ2NjY1Mg== jhamman 2443309 2018-01-30T03:35:07Z 2018-01-30T03:35:07Z MEMBER

I tried the above example with the multiprocessing and distributed schedulers. With the multiprocessing scheduler, I can reproduce the error described above. With the distributed scheduler, no error is encountered.

Python In [4]: import xarray as xr ...: import numpy as np ...: import dask.multiprocessing ...: ...: from dask.distributed import Client ...: ...: client = Client() ...: print(client) ...: ...: # Generate dummy data and build xarray dataset ...: mat = np.random.rand(10, 90, 90) ...: ds = xr.Dataset(data_vars={'foo': (('time', 'x', 'y'), mat)}) ...: ...: # Write dataset to netcdf without compression ...: ds.to_netcdf('dummy_data_3d.nc') ...: # Write with zlib compersison ...: ds.to_netcdf('dummy_data_3d_with_compression.nc', ...: encoding={'foo': {'zlib': True}}) ...: # Write data as int16 with scale factor applied ...: ds.to_netcdf('dummy_data_3d_with_scale_factor.nc', ...: encoding={'foo': {'dtype': 'int16', ...: 'scale_factor': 0.01, ...: '_FillValue': -9999}}) ...: ...: # Load data from netCDF files ...: ds_vanilla = xr.open_dataset('dummy_data_3d.nc', chunks={'time': 1}) ...: ds_scaled = xr.open_dataset('dummy_data_3d_with_scale_factor.nc', chunks={'time': 1}) ...: ds_compressed = xr.open_dataset('dummy_data_3d_with_compression.nc', chunks={'time': 1}) ...: ...: # Do computation using dask's multiprocessing scheduler ...: foo = ds_vanilla.foo.mean(dim=['x', 'y']).compute() ...: foo = ds_scaled.foo.mean(dim=['x', 'y']).compute() ...: foo = ds_compressed.foo.mean(dim=['x', 'y']).compute()


I personally don't have any use cases that would prefer the multiprocessing scheduler over the distributed scheduler but I have been working on improving the I/O performance and stability with xarray and dask lately. If anyone would like to work on this, I'd gladly help this get cleaned up or put a more definitive no on whether or not this can/should work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 85.651ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows