home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 992636601

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
992636601 MDU6SXNzdWU5OTI2MzY2MDE= 5785 inconsistent mean computation 30270490 closed 0     3 2021-09-09T21:12:52Z 2022-01-13T03:05:25Z 2022-01-13T03:05:25Z NONE      

What happened: I was computing an objective function between a remotely sensed snow covered area dataset and a simulated snow covered area dataset and every time I ran the code I got a different answer. I narrowed the issue down to when the monthly mean is taken of an image stack (geotiffs here) and then the mean is taken of the monthly values.

Maybe there is something about using geotiffs in this way that I am missing? The issue also occurs when I use the NaNs in the geotiffs to insert NaNs in the proper places in the simulation output. When I don't "cascade" the NaNs, the issue goes away in the simulated data but it always persists in the geotiffs.

What you expected to happen: I was expecting the mean of means output to be the same ever iteration as the same image stack was used. Minimal Complete Verifiable Example: I can't share the image stack at this time, but here is a shortened version of the code. ```python test=[] for i in tqdm.notebook.tqdm(range(2)): # load fSCA and assign dims ls = xr.open_mfdataset(files, parallel=True, chunks=dict(time=1,band=1, x=2500, y=2500), preprocess=add_time_coord) ls['time'] = dates ls = ls.rename_vars(dict(band_data='fsca'))

# threshold fSCA to SCA and cascade NaNs
ls['sca'] = xr.where(ls.fsca >= 150, 1, 0) # compute sca from fsca

test.append(ls.sca.groupby("time.month").mean().compute()) # monthly images

np.sqrt(np.mean(np.square((test[0].mean(('month'))-test[1].mean(('month'))).values)))

0.004654084555997707 ``` I expected the value here to be zero. Anything else we need to know?: The geotiffs have a no data value of -999 which is decoded to np.NaN

Environment:

Output of <tt>xr.show_versions()</tt> ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-18362-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.3 cfgrib: None iris: None bottleneck: None dask: 2021.05.0 distributed: 2021.05.0 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: None pytest: None IPython: 7.23.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5785/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.872ms · About: xarray-datasette