home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 433916353

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
433916353 MDU6SXNzdWU0MzM5MTYzNTM= 2902 DataArray sum().values depends on chunk size 15570875 closed 0     1 2019-04-16T18:09:33Z 2019-04-17T02:01:55Z 2019-04-17T02:01:55Z NONE      

Hi,

The code below creates a Dataset with an NxNxN DataArray that is equal to a constant val. For various re-chunked copies of the Dataset, the code computes the sum of the array, and compares it to the exact value N*N*N*val. I find that the printed values are different, at the round-off level, for different chunk sizes.

While I'm not surprised at these round-off differences, I could not find mention of such behavior in the xarray documentation.

Is this feature known to xarray developers? Do xarray developers consider it a feature or a bug?

Either way, I think it would be useful if the xarray documentation would mention that the results of some operations depends on chunk size.

code: ```import numpy as np import xarray as xr

N = 128

val = 1.9 val_array = np.full((N, N, N), val) exact_sum = N * N * N * val

ds = xr.DataArray(val_array, name='val_array', dims=['x', 'y', 'z']).to_dataset()

rel_diff = (ds['val_array'].sum().values - exact_sum) / exact_sum print('no chunking, rel_diff = %e' % rel_diff)

for chunk_x in [N//16, N//4, N]: for chunk_y in [N//16, N//4, N]: for chunk_z in [N//16, N//4, N]: ds2 = ds.chunk({'x':chunk_x, 'y':chunk_y, 'z':chunk_z}) rel_diff = (ds2['val_array'].sum().values - exact_sum) / exact_sum print('chunk_x = %3d, chunk_y = %3d, chunk_z = %3d, rel_diff = %e' \ % (chunk_x, chunk_y, chunk_z, rel_diff)) ```

results: no chunking, rel_diff = -4.557758e-15 chunk_x = 8, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 8, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 8, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 128, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 128, chunk_z = 32, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 128, chunk_z = 128, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 8, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 32, chunk_z = 32, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 32, chunk_z = 128, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 128, chunk_z = 8, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 128, chunk_z = 32, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 128, chunk_z = 128, rel_diff = -4.557758e-15

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.3 cartopy: None seaborn: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2902/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 79.344ms · About: xarray-datasette