home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 373121666

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
373121666 MDU6SXNzdWUzNzMxMjE2NjY= 2503 Problems with distributed and opendap netCDF endpoint 1197350 closed 0     26 2018-10-23T17:48:20Z 2019-04-09T12:02:01Z 2019-04-09T12:02:01Z MEMBER      

Code Sample

I am trying to load a dataset from an opendap endpoint using xarray, netCDF4, and distributed. I am having a problem only with non-local distributed schedulers (KubeCluster specifically). This could plausibly be an xarray, dask, or pangeo issue, but I have decided to post it here.

```python import xarray as xr import dask

create dataset from Unidata's test opendap endpoint, chunked in time

url = 'http://remotetest.unidata.ucar.edu/thredds/dodsC/testdods/coads_climatology.nc' ds = xr.open_dataset(url, decode_times=False, chunks={'TIME': 1})

all these work

with dask.config.set(scheduler='synchronous'): ds.SST.compute() with dask.config.set(scheduler='processes'): ds.SST.compute() with dask.config.set(scheduler='threads'): ds.SST.compute()

this works too

from dask.distributed import Client local_client = Client() with dask.config.set(get=local_client): ds.SST.compute()

but this does not

cluster = KubeCluster(n_workers=2) kube_client = Client(cluster) with dask.config.set(get=kube_client): ds.SST.compute() ```

In the worker log, I see the following sort of errors. distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 5, 0, 0) distributed.worker - INFO - Dependent not found: open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf 0 . Asking scheduler distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 3, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 0, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 1, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 7, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 6, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 2, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 9, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 8, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 11, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 10, 0, 0) distributed.worker - INFO - Can't find dependencies for key ('open_dataset-4a0403564ad0e45788e42887b9bc0997SST-9fd3e5906a2a54cb28f48a7f2d46e4bf', 4, 0, 0) distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7f45d6fcbb38>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x7f45d70507b8>, encoded_fill_values={-1e+34}, decoded_fill_value=nan, dtype=dtype('float32')), dtype=dtype('float32')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(3, 4, None), slice(0, 90, None), slice(0, 180, None))) kwargs: {} Exception: RuntimeError('NetCDF: Not a valid ID',) Ultimately, the error comes from the netCDF library: RuntimeError('NetCDF: Not a valid ID',)

This seems like something to do with serialization of the netCDF store. The worker images have identical netcdf version (and all other package versions). I am at a loss for how to debug further.

Output of xr.show_versions()

xr.show_versions() ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.111+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.2 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: 2.2.0 bottleneck: None cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 39.2.0 pip: 18.0 conda: 4.5.4 pytest: 3.8.0 IPython: 6.4.0 sphinx: None ``` `cube_client.get_versions(check=True)` ``` {'scheduler': {'host': (('python', '3.6.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.4.111+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')), 'packages': {'required': (('dask', '0.18.2'), ('distributed', '1.22.1'), ('msgpack', '0.5.6'), ('cloudpickle', '0.5.5'), ('tornado', '5.0.2'), ('toolz', '0.9.0')), 'optional': (('numpy', '1.15.1'), ('pandas', '0.23.2'), ('bokeh', '0.12.16'), ('lz4', '1.1.0'), ('blosc', '1.5.1'))}}, 'workers': {'tcp://10.20.8.4:36940': {'host': (('python', '3.6.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.4.111+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')), 'packages': {'required': (('dask', '0.18.2'), ('distributed', '1.22.1'), ('msgpack', '0.5.6'), ('cloudpickle', '0.5.5'), ('tornado', '5.0.2'), ('toolz', '0.9.0')), 'optional': (('numpy', '1.15.1'), ('pandas', '0.23.2'), ('bokeh', '0.12.16'), ('lz4', '1.1.0'), ('blosc', '1.5.1'))}}, 'tcp://10.21.177.254:42939': {'host': (('python', '3.6.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.4.111+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')), 'packages': {'required': (('dask', '0.18.2'), ('distributed', '1.22.1'), ('msgpack', '0.5.6'), ('cloudpickle', '0.5.5'), ('tornado', '5.0.2'), ('toolz', '0.9.0')), 'optional': (('numpy', '1.15.1'), ('pandas', '0.23.2'), ('bokeh', '0.12.16'), ('lz4', '1.1.0'), ('blosc', '1.5.1'))}}}, 'client': {'host': [('python', '3.6.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.4.111+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')], 'packages': {'required': [('dask', '0.18.2'), ('distributed', '1.22.1'), ('msgpack', '0.5.6'), ('cloudpickle', '0.5.5'), ('tornado', '5.0.2'), ('toolz', '0.9.0')], 'optional': [('numpy', '1.15.1'), ('pandas', '0.23.2'), ('bokeh', '0.12.16'), ('lz4', '1.1.0'), ('blosc', '1.5.1')]}}} ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2503/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 26 rows from issue in issue_comments
Powered by Datasette · Queries took 242.871ms · About: xarray-datasette