issues: 992636601
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
992636601 | MDU6SXNzdWU5OTI2MzY2MDE= | 5785 | inconsistent mean computation | 30270490 | closed | 0 | 3 | 2021-09-09T21:12:52Z | 2022-01-13T03:05:25Z | 2022-01-13T03:05:25Z | NONE | What happened: I was computing an objective function between a remotely sensed snow covered area dataset and a simulated snow covered area dataset and every time I ran the code I got a different answer. I narrowed the issue down to when the monthly mean is taken of an image stack (geotiffs here) and then the mean is taken of the monthly values. Maybe there is something about using geotiffs in this way that I am missing? The issue also occurs when I use the NaNs in the geotiffs to insert NaNs in the proper places in the simulation output. When I don't "cascade" the NaNs, the issue goes away in the simulated data but it always persists in the geotiffs. What you expected to happen: I was expecting the mean of means output to be the same ever iteration as the same image stack was used. Minimal Complete Verifiable Example: I can't share the image stack at this time, but here is a shortened version of the code. ```python test=[] for i in tqdm.notebook.tqdm(range(2)): # load fSCA and assign dims ls = xr.open_mfdataset(files, parallel=True, chunks=dict(time=1,band=1, x=2500, y=2500), preprocess=add_time_coord) ls['time'] = dates ls = ls.rename_vars(dict(band_data='fsca'))
np.sqrt(np.mean(np.square((test[0].mean(('month'))-test[1].mean(('month'))).values)))
Environment: Output of <tt>xr.show_versions()</tt>------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-18362-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.3 cfgrib: None iris: None bottleneck: None dask: 2021.05.0 distributed: 2021.05.0 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: None pytest: None IPython: 7.23.1 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5785/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |