issues: 1033142897
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1033142897 | I_kwDOAMm_X849lIJx | 5883 | Failing parallel writes to_zarr with regions parameter? | 4666753 | closed | 0 | 1 | 2021-10-22T03:33:02Z | 2021-10-22T18:37:06Z | 2021-10-22T18:37:06Z | CONTRIBUTOR | What happened: Following guidance on how to use regions keyword in What you expected to happen: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not). Minimal Complete Verifiable Example: ```python path = "tmp.zarr" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes import numpy as np import dask.array as da import xarray as xr dummy values for metadataxr.Dataset( {"x": (("a", "b"), -da.ones((10, 7), chunks=(None, 1)))}, {"apple": ("a", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode="w", compute=False) actual values to saveds = xr.Dataset( {"x": (("a", "b"), np.random.uniform(size=(10, 7)))}, {"apple": ("a", np.arange(10))}, ) save them using NTHREADSwith mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode="r+", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of axr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ``` Anything else we need to know?:
Environment: Output of <tt>xr.show_versions()</tt>``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5883/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |