home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 327064908

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
327064908 MDU6SXNzdWUzMjcwNjQ5MDg= 2190 Parallel non-locked read using dask.Client crashes 6404167 closed 0     5 2018-05-28T15:42:40Z 2019-01-14T21:09:04Z 2019-01-14T21:09:03Z CONTRIBUTOR      

I'm trying to parallelize my code using Dask. Using their distributed.Client() I was able to do computations in parallel. Unfortunately, it seems ~60% of the time is spend in a file lock. As I'm only reading data and doing computations in memory, I should be able to work without a lock, so I tried to pass lock=False to open_dataset. Unfortunately this crashes my code. A minimal reproducible example can be found below:

``` python import xarray as xr import dask.array as da from dask.distributed import Client from IPython import embed

First generate a file with random numbers

rng = da.random.RandomState() shape = (10, 10000) chunks = (10, 10) dims = ['y', 'z'] x = rng.standard_normal(shape, chunks=chunks) da = xr.DataArray(x, dims=dims, name='x') da.to_netcdf('test.nc')

Open file without a lock

client = Client(processes=False) ds = xr.open_dataset('test.nc', chunks=dict(zip(dims, chunks)), lock=False)

This will crash!

print((ds['x'] * ds['x']).compute()) Crashes with (sometimes) python distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7ffb69033c50>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 10, None), slice(5710, 5720, None))) kwargs: {} Exception: RuntimeError('NetCDF: HDF error',) `` And usually just withterminated by signal SIGSEGV (Address boundary error)`

Output of xr.show_versions()

``` python INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.9-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.2 pandas: 0.20.3 numpy: 1.14.0 scipy: 0.19.1 netCDF4: 1.4.0 h5netcdf: None h5py: 2.7.1 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 10.0.1 conda: None pytest: 3.4.0 IPython: 6.3.1 sphinx: 1.6.4 ```

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

```python

Your code here

```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

Output of xr.show_versions()

# Paste the output here xr.show_versions() here
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2190/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.603ms · About: xarray-datasette