id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 481866516,MDU6SXNzdWU0ODE4NjY1MTY=,3225,xr.DataArray.where sets valid points to nan when using several dask chunks,20225454,closed,0,,,3,2019-08-17T09:09:59Z,2022-04-18T15:58:40Z,2022-04-18T15:58:40Z,NONE,,,,"#### MCVE Code Sample I am trying to randomly delete a fraction of a xr.DataArray ([see identical StackOverflow question](https://stackoverflow.com/questions/56686562/xr-dataarray-where-sets-valid-points-to-nan-when-using-several-dask-chunks)) and subsequently access only the values from the original dataset data that were deleted. This works fine as long as the data is not stored in dask arrays or in only one dask array. As soon as I define chunks smaller than the total size of the data, the original values are set to nan. ```python data = xr.DataArray(np.arange(5*5*5.).reshape(5,5,5), dims=('time','latitude','longitude')) data.to_netcdf('/path/to/file.nc') #data = xr.open_dataarray('/path/to/file.nc', chunks={'time':5}) # produces expected output data = xr.open_dataarray('/path/to/file.nc', chunks={'time':2}) # produces observed output def set_fraction_randomly_to_nan(data, frac_missing): np.random.seed(0) data[np.random.rand(*data.shape) < frac_missing] = np.nan return data data_lost = xr.apply_ufunc(set_fraction_randomly_to_nan, data.copy(deep=True), output_core_dims=[['latitude','longitude']], dask='parallelized', input_core_dims=[['latitude','longitude']], output_dtypes=[data.dtype], kwargs={'frac_missing': 0.5}) print(data[0,-4:,-4:].values) # >> # [[ 6. 7. 8. 9.] # [11. 12. 13. 14.] # [16. 17. 18. 19.] # [21. 22. 23. 24.]] print(data.where(np.isnan(data_lost),0)[0,-4:,-4:].values) ``` #### Expected Output expected output of the last line: keep all values where `np.isnan(data_lost)` is True and set rest to zero ```python [[ 6. 0. 0. 9.] [ 0. 0. 0. 14.] [16. 0. 0. 0.] [ 0. 22. 0. 24.]] ``` #### Problem Description observed output of the last line: set all values where `np.isnan(data_lost)` is True to nan and set rest to zero ```python [[nan 0. 0. nan] [ 0. 0. 0. nan] [nan 0. 0. 0.] [ 0. nan 0. nan]] ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.4.138-59-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.17.2 distributed: 1.21.5 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: None pytest: 3.5.0 IPython: 6.3.1 sphinx: 1.7.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3225/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue