home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 332762756

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
332762756 MDU6SXNzdWUzMzI3NjI3NTY= 2234 fillna error with distributed 1197350 closed 0     3 2018-06-15T12:54:54Z 2018-06-15T13:13:54Z 2018-06-15T13:13:54Z MEMBER      

Code Sample, a copy-pastable example if possible

The following code works with the default dask threaded scheduler. python da = xr.DataArray([1, 1, 1, np.nan]).chunk() da.fillna(0.).mean().load()

It fails with distributed. I see the following error on the client side: ```


KilledWorker Traceback (most recent call last) <ipython-input-7-5ed3c292af2e> in <module>() ----> 1 da.fillna(0.).mean().load()

/opt/conda/lib/python3.6/site-packages/xarray/core/dataarray.py in load(self, kwargs) 631 dask.array.compute 632 """ --> 633 ds = self._to_temp_dataset().load(kwargs) 634 new = self._from_temp_dataset(ds) 635 self._variable = new._variable

/opt/conda/lib/python3.6/site-packages/xarray/core/dataset.py in load(self, kwargs) 489 490 # evaluate all the dask arrays simultaneously --> 491 evaluated_data = da.compute(*lazy_data.values(), kwargs) 492 493 for k, data in zip(lazy_data, evaluated_data):

/opt/conda/lib/python3.6/site-packages/dask/base.py in compute(args, kwargs) 398 keys = [x.dask_keys() for x in collections] 399 postcomputes = [x.dask_postcompute() for x in collections] --> 400 results = schedule(dsk, keys, kwargs) 401 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 402

/opt/conda/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, **kwargs) 2157 try: 2158 results = self.gather(packed, asynchronous=asynchronous, -> 2159 direct=direct) 2160 finally: 2161 for f in futures.values():

/opt/conda/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous) 1560 return self.sync(self._gather, futures, errors=errors, 1561 direct=direct, local_worker=local_worker, -> 1562 asynchronous=asynchronous) 1563 1564 @gen.coroutine

/opt/conda/lib/python3.6/site-packages/distributed/client.py in sync(self, func, args, kwargs) 650 return future 651 else: --> 652 return sync(self.loop, func, args, **kwargs) 653 654 def repr(self):

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, args, kwargs) 273 e.wait(10) 274 if error[0]: --> 275 six.reraise(error[0]) 276 else: 277 return result[0]

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in f() 258 yield gen.moment 259 thread_state.asynchronous = True --> 260 result[0] = yield make_coro() 261 except Exception as exc: 262 error[0] = sys.exc_info()

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self) 1097 1098 try: -> 1099 value = future.result() 1100 except Exception: 1101 self.had_exception = True

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self) 1105 if exc_info is not None: 1106 try: -> 1107 yielded = self.gen.throw(*exc_info) 1108 finally: 1109 # Break up a reference to itself

/opt/conda/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1437 six.reraise(type(exception), 1438 exception, -> 1439 traceback) 1440 if errors == 'skip': 1441 bad_keys.add(key)

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

KilledWorker: ("('isna-mean_chunk-where-mean_agg-aggregate-74ec0f30171c1c667640f1f18df5f84b',)", 'tcp://10.20.197.7:43357') While the worker logs show this: distributed.worker - ERROR - Can't get attribute 'isna' on <module 'pandas.core.dtypes.missing' from '/opt/conda/lib/python3.6/site-packages/pandas/core/dtypes/missing.py'> Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/distributed/worker.py", line 346, in handle_scheduler self.ensure_computing]) File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(exc_info) File "/opt/conda/lib/python3.6/site-packages/distributed/core.py", line 361, in handle_stream msgs = yield comm.read() File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(exc_info) File "/opt/conda/lib/python3.6/site-packages/distributed/comm/tcp.py", line 203, in read deserializers=deserializers) File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/opt/conda/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper yielded = next(result) File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 79, in from_frames res = _from_frames() File "/opt/conda/lib/python3.6/site-packages/distributed/comm/utils.py", line 65, in _from_frames deserializers=deserializers) File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/core.py", line 122, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 236, in deserialize return loads(header, frames) File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/serialize.py", line 58, in pickle_loads return pickle.loads(b''.join(frames)) File "/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) AttributeError: Can't get attribute 'isna' on <module 'pandas.core.dtypes.missing' from '/opt/conda/lib/python3.6/site-packages/pandas/core/dtypes/missing.py'> ```

This could very well be a distributed issue. Or a pandas issue. I'm not too sure what is going on. Why is pandas even involved at all?

Problem description

This should not raise an error. It worked fine in previous versions, but something in our latest environment has caused it to break.

Expected Output

<xarray.DataArray ()> array(0.75)

Output of xr.show_versions()

This is running in the latest pangeo.pydata.org environment (https://github.com/pangeo-data/helm-chart/pull/29). @mrocklin picked a custom set of dask / distributed commits to install.

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.111+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.23.1 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: None h5py: None Nio: None zarr: 2.2.0 bottleneck: None cyordereddict: None dask: 0.17.4+51.g0a7fe8de distributed: 1.21.8+54.g7909f27d matplotlib: 2.2.2 cartopy: None seaborn: None setuptools: 39.2.0 pip: 10.0.1 conda: 4.5.4 pytest: 3.6.1 IPython: 6.4.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2234/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 240.881ms · About: xarray-datasette