html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1811#issuecomment-389553054,https://api.github.com/repos/pydata/xarray/issues/1811,389553054,MDEyOklzc3VlQ29tbWVudDM4OTU1MzA1NA==,2443309,2018-05-16T15:06:51Z,2018-05-16T15:06:51Z,MEMBER,Thanks all for the input/reviews on this PR. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-389296294,https://api.github.com/repos/pydata/xarray/issues/1811,389296294,MDEyOklzc3VlQ29tbWVudDM4OTI5NjI5NA==,1217238,2018-05-15T20:06:58Z,2018-05-15T20:06:58Z,MEMBER,(assuming tests pass),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-389296253,https://api.github.com/repos/pydata/xarray/issues/1811,389296253,MDEyOklzc3VlQ29tbWVudDM4OTI5NjI1Mw==,1217238,2018-05-15T20:06:49Z,2018-05-15T20:06:49Z,MEMBER,"> Yes, just tried again. I'm open to ideas but would also like to move this issue along first, if possible.
Sounds good, let's go ahead and merge this!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-389294623,https://api.github.com/repos/pydata/xarray/issues/1811,389294623,MDEyOklzc3VlQ29tbWVudDM4OTI5NDYyMw==,2443309,2018-05-15T20:00:58Z,2018-05-15T20:00:58Z,MEMBER,"> I'm little surprised it doesn't just work with scipy and h5netcdf -- have you tried them again recently?
Yes, just tried again. I'm open to ideas but would also like to move this issue along first, if possible.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-388984522,https://api.github.com/repos/pydata/xarray/issues/1811,388984522,MDEyOklzc3VlQ29tbWVudDM4ODk4NDUyMg==,2443309,2018-05-14T22:34:48Z,2018-05-14T22:34:48Z,MEMBER,"@shoyer / @mrocklin - I think this is ready for another review. Since I asked for reviews last, I have:
- reworked the task graph to write one file / close one file rather than write all / close all.
- moved the tests to the `TestDask` class which seemed more appropriate. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-388649669,https://api.github.com/repos/pydata/xarray/issues/1811,388649669,MDEyOklzc3VlQ29tbWVudDM4ODY0OTY2OQ==,2443309,2018-05-13T19:20:25Z,2018-05-13T19:20:25Z,MEMBER,"Actually, scratch that - I just found https://github.com/pydata/xarray/pull/1811/files#r183091638 which explains what is goin on. Sorry for the noise","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-388649561,https://api.github.com/repos/pydata/xarray/issues/1811,388649561,MDEyOklzc3VlQ29tbWVudDM4ODY0OTU2MQ==,2443309,2018-05-13T19:18:37Z,2018-05-13T19:18:47Z,MEMBER,@shoyer - I'm getting a [test failure](https://travis-ci.org/pydata/xarray/jobs/378274606#L5269-L5326) in the h5netcdf backend that seems unrelated. Do you know if something has changed in the string handling of h5netcdf recently?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-383077508,https://api.github.com/repos/pydata/xarray/issues/1811,383077508,MDEyOklzc3VlQ29tbWVudDM4MzA3NzUwOA==,2443309,2018-04-20T12:16:43Z,2018-04-20T12:16:43Z,MEMBER,The test failures in the [latest build](https://travis-ci.org/pydata/xarray/builds/368657931?utm_source=github_status&utm_medium=notification) appear to be unrelated to this PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382488891,https://api.github.com/repos/pydata/xarray/issues/1811,382488891,MDEyOklzc3VlQ29tbWVudDM4MjQ4ODg5MQ==,2443309,2018-04-18T18:43:23Z,2018-04-18T18:43:23Z,MEMBER,"I see you were already using the LocalCluster (Client). Disregard my comment on switching clusters. I seem to be getting my Github issues mixed up.
It may be good to take this offline to one of the pangeo/zarr/dask/xarray issues (there are a few). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382466626,https://api.github.com/repos/pydata/xarray/issues/1811,382466626,MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg==,1872600,2018-04-18T17:30:25Z,2018-04-18T17:32:21Z,NONE,"@jhamman, I was just using `client = Client()`. Should I be using `LocalCluster` instead?
(there is no kubernetes on this JupyterHub).
Also, is there a better place to have this sort of discussion or is it okay here?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382450276,https://api.github.com/repos/pydata/xarray/issues/1811,382450276,MDEyOklzc3VlQ29tbWVudDM4MjQ1MDI3Ng==,2443309,2018-04-18T16:37:12Z,2018-04-18T16:37:12Z,MEMBER,"@rsignell-usgs - This is going to require some debugging on your part but here are a few suggestions:
- try smaller chunksize
- use a LocalCluster before using the KubeCluster
- turn up dask's logging level for the scheduler/workers
> and after 20 seconds or so, the process dies with this error:
looking above, I don't see any error. Your goal in the next step should be to find the error. If workers are dying, they should report that in the worker logs. If your cluster is dying, that should be reported in the notebook. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382421609,https://api.github.com/repos/pydata/xarray/issues/1811,382421609,MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ==,1872600,2018-04-18T15:11:02Z,2018-04-18T15:14:12Z,NONE,"@jhamman, I tried the same code with a single-threaded scheduler:
```python
...
delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync)
```
and it ran to completion with no errors (taking 2 hours for 100GB to Zarr). What should I try next?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382169654,https://api.github.com/repos/pydata/xarray/issues/1811,382169654,MDEyOklzc3VlQ29tbWVudDM4MjE2OTY1NA==,2443309,2018-04-17T22:04:31Z,2018-04-17T22:04:31Z,MEMBER,@rsignell-usgs - can you repeat this example with a single threaded scheduler. It will be slow but it should work (or return a more informative error). ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-381969631,https://api.github.com/repos/pydata/xarray/issues/1811,381969631,MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ==,1872600,2018-04-17T12:12:15Z,2018-04-17T12:15:19Z,NONE,"@jhamman , I'm trying to test `compute=False` out this code:
```python
# Write National Water Model data to Zarr
from dask.distributed import Client
import pandas as pd
import xarray as xr
import s3fs
import zarr
if __name__ == '__main__':
client = Client()
root = '/projects/water/nwm/data/forcing_short_range/' # Local Files
# root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP
bucket_endpoint='https://s3.us-west-1.amazonaws.com/'
# bucket_endpoint='https://iu.jetstream-cloud.org:8080'
f_zarr = 'rsignell/nwm/test_week'
dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H')
urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates]
ds = xr.open_mfdataset(urls, concat_dim='time', lock=True)
ds = ds.drop(['ProjectionCoordinateSystem'])
fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint))
d = s3fs.S3Map(f_zarr, s3=fs)
compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
encoding = {vname: {'compressor': compressor} for vname in ds.data_vars}
delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
persist_store = delayed_store.persist(retries=100)
```
and after 20 seconds or so, the process dies with this error:
```python-traceback
/home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742:
UserWarning: Large object of size 1.23 MB detected in task graph:
(, key = 'foo'
def __getitem__(self, key):
> if self._objects[key] is not None:
E KeyError: 'foo'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-373092099,https://api.github.com/repos/pydata/xarray/issues/1811,373092099,MDEyOklzc3VlQ29tbWVudDM3MzA5MjA5OQ==,1217238,2018-03-14T16:45:25Z,2018-03-14T16:45:25Z,MEMBER,"To elaborate a little bit on my last comment (which I submitted very quickly when my bus was arriving), the way to make dependent tasks with dask.delayed is to add dummy function arguments, e.g.,
```python
def finalize_store(store, write):
del write # unused
store.sync()
store.close()
write = dask.array.store(..., compute=False)
write_and_close = dask.delayed(finalize_store)(store, write)
write_and_close.compute() # writes and syncs
```
Potentially some of this logic could get moved into `ArrayWriter.sync()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-373072825,https://api.github.com/repos/pydata/xarray/issues/1811,373072825,MDEyOklzc3VlQ29tbWVudDM3MzA3MjgyNQ==,1217238,2018-03-14T15:54:43Z,2018-03-14T15:54:43Z,MEMBER,"One potential issue here is the lack of clean-up (which may be unnecessary if `autoclose=True`). You want to construct a single dask graph with a structure like the following:
- Tasks for writing all array data (i.e., from `ArrayWriter`).
- Tasks for calling sync() and close() on each datastore object. These should depend on the appropriate writing tasks.
- A single tasks that depends on writing all datastores. This is what the delayed object returned by `save_mfdataset` should return.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795