home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 481866516

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
481866516 MDU6SXNzdWU0ODE4NjY1MTY= 3225 xr.DataArray.where sets valid points to nan when using several dask chunks 20225454 closed 0     3 2019-08-17T09:09:59Z 2022-04-18T15:58:40Z 2022-04-18T15:58:40Z NONE      

MCVE Code Sample

I am trying to randomly delete a fraction of a xr.DataArray (see identical StackOverflow question) and subsequently access only the values from the original dataset data that were deleted.

This works fine as long as the data is not stored in dask arrays or in only one dask array. As soon as I define chunks smaller than the total size of the data, the original values are set to nan.

```python data = xr.DataArray(np.arange(555.).reshape(5,5,5), dims=('time','latitude','longitude')) data.to_netcdf('/path/to/file.nc')

data = xr.open_dataarray('/path/to/file.nc', chunks={'time':5}) # produces expected output

data = xr.open_dataarray('/path/to/file.nc', chunks={'time':2}) # produces observed output

def set_fraction_randomly_to_nan(data, frac_missing): np.random.seed(0) data[np.random.rand(*data.shape) < frac_missing] = np.nan return data

data_lost = xr.apply_ufunc(set_fraction_randomly_to_nan, data.copy(deep=True), output_core_dims=[['latitude','longitude']], dask='parallelized', input_core_dims=[['latitude','longitude']], output_dtypes=[data.dtype], kwargs={'frac_missing': 0.5})

print(data[0,-4:,-4:].values)

>>

[[ 6. 7. 8. 9.]

[11. 12. 13. 14.]

[16. 17. 18. 19.]

[21. 22. 23. 24.]]

print(data.where(np.isnan(data_lost),0)[0,-4:,-4:].values) ```

Expected Output

expected output of the last line: keep all values where np.isnan(data_lost) is True and set rest to zero

python [[ 6. 0. 0. 9.] [ 0. 0. 0. 14.] [16. 0. 0. 0.] [ 0. 22. 0. 24.]]

Problem Description

observed output of the last line: set all values where np.isnan(data_lost) is True to nan and set rest to zero

python [[nan 0. 0. nan] [ 0. 0. 0. nan] [nan 0. 0. 0.] [ 0. nan 0. nan]]

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.4.138-59-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.17.2 distributed: 1.21.5 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: None pytest: 3.5.0 IPython: 6.3.1 sphinx: 1.7.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 400.447ms · About: xarray-datasette