home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE", issue = 286542795 and user = 1872600 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • rsignell-usgs · 3 ✖

issue 1

  • WIP: Compute==False for to_zarr and to_netcdf · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
382466626 https://github.com/pydata/xarray/pull/1811#issuecomment-382466626 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg== rsignell-usgs 1872600 2018-04-18T17:30:25Z 2018-04-18T17:32:21Z NONE

@jhamman, I was just using client = Client(). Should I be using LocalCluster instead?
(there is no kubernetes on this JupyterHub).
Also, is there a better place to have this sort of discussion or is it okay here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795
382421609 https://github.com/pydata/xarray/pull/1811#issuecomment-382421609 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ== rsignell-usgs 1872600 2018-04-18T15:11:02Z 2018-04-18T15:14:12Z NONE

@jhamman, I tried the same code with a single-threaded scheduler: python ... delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync) and it ran to completion with no errors (taking 2 hours for 100GB to Zarr). What should I try next?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795
381969631 https://github.com/pydata/xarray/pull/1811#issuecomment-381969631 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ== rsignell-usgs 1872600 2018-04-17T12:12:15Z 2018-04-17T12:15:19Z NONE

@jhamman , I'm trying to test compute=False out this code: ```python

Write National Water Model data to Zarr

from dask.distributed import Client import pandas as pd import xarray as xr import s3fs import zarr

if name == 'main':

client = Client()

root = '/projects/water/nwm/data/forcing_short_range/'                      # Local Files

root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP

bucket_endpoint='https://s3.us-west-1.amazonaws.com/'

bucket_endpoint='https://iu.jetstream-cloud.org:8080'

f_zarr = 'rsignell/nwm/test_week'

dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H')
urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates]

ds = xr.open_mfdataset(urls, concat_dim='time', lock=True)
ds = ds.drop(['ProjectionCoordinateSystem'])

fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint))
d = s3fs.S3Map(f_zarr, s3=fs)

compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
encoding = {vname: {'compressor': compressor} for vname in ds.data_vars}

delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
persist_store = delayed_store.persist(retries=100)

``` and after 20 seconds or so, the process dies with this error:

```python-traceback /home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742: UserWarning: Large object of size 1.23 MB detected in task graph:

(<xarray.backends.zarr.ZarrStore object at 0x7f5d8 ... deedecefab224')

Consider scattering large objects ahead of time with client.scatter to reduce scheduler burden and keep data on workers

future = client.submit(func, big_data)    # bad

big_future = client.scatter(big_data)     # good
future = client.submit(func, big_future)  # good

% (format_bytes(len(b)), s)) ``` Do you have suggestions on how to modify my code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.127ms · About: xarray-datasette