id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
548475127,MDU6SXNzdWU1NDg0NzUxMjc=,3686,Different data values from xarray open_mfdataset when using chunks ,15016780,closed,0,,,7,2020-01-11T20:15:12Z,2020-01-20T20:35:48Z,2020-01-20T20:35:47Z,NONE,,,,"#### MCVE Code Sample
 You will first need to download or (mount podaac's drive) from PO.DAAC, including credentials:
```bash
curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/152/
curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/153/20020602090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/153/
curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/154/20020603090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/154/
curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/155/20020604090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/155/
curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/156/20020605090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/156/
```
Then run the following code:
```python
from datetime import datetime
import xarray as xr
import glob
def generate_file_list(start_doy, end_doy):   
    """"""
    Given a start day and end end day, generate a list of file locations.
    Assumes a 'prefix' and 'year' variables have already been defined.
    'Prefix' should be a local directory or http url and path.
    'Year' should be a 4 digit year.
    """"""
    days_of_year = list(range(start_doy, end_doy))
    fileObjs = []
    for doy in days_of_year:
        if doy < 10:
            doy = f""00{doy}""
        elif doy >= 10 and doy < 100:
            doy = f""0{doy}""            
        file = glob.glob(f""{prefix}/{doy}/*.nc"")[0]
        fileObjs.append(file)
    return fileObjs
# Invariants - but could be made configurable
year = 2002
prefix = f""data/mursst_netcdf""
chunks = {'time': 1, 'lat': 1799, 'lon': 3600}
# Create a list of files
start_doy = 152
num_days = 5
end_doy = start_doy + num_days
fileObjs = generate_file_list(start_doy, end_doy)
# will use this timeslice in query later on
time_slice = slice(datetime.strptime(f""{year}-06-02"", '%Y-%m-%d'), datetime.strptime(f""{year}-06-04"", '%Y-%m-%d'))
print(""results from unchunked dataset"")
ds_unchunked = xr.open_mfdataset(fileObjs, combine='by_coords')
print(ds_unchunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(ds_unchunked.analysed_sst.sel(time=time_slice).mean().values)
print(f""results from chunked dataset using {chunks}"")
ds_chunked = xr.open_mfdataset(fileObjs, combine='by_coords', chunks=chunks)
print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(ds_chunked.analysed_sst.sel(time=time_slice).mean().values)
print(""results from chunked dataset using 'auto'"")
ds_chunked = xr.open_mfdataset(fileObjs, combine='by_coords', chunks={'time': 'auto', 'lat': 'auto', 'lon': 'auto'})
print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values)
print(ds_chunked.analysed_sst.sel(time=time_slice).mean().values)
```
Note, these are just a few examples but I tried a variety of other chunk options and got similar discrepancies between the unchunked and chunked datasets.
Output:
```
results from unchunked dataset
290.13754
286.7869
results from chunked dataset using {'time': 1, 'lat': 1799, 'lon': 3600}
290.13757
286.81107
results from chunked dataset using 'auto'
290.1377
286.8118
```
#### Expected Output
Values output from queries of chunked and unchunked xarray dataset are equal.
#### Problem Description
I want to understand how to chunk or query data to verify data opened using chunks will have the same output as data opened without chunking. Would like to store data ultimately in Zarr but verifying data integrity is critical.
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.1 | packaged by conda-forge | (default, Jan  5 2020, 20:58:18) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.154-99.181.amzn1.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.1
distributed: 2.9.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 44.0.0.post20200102
pip: 19.3.1
conda: None
pytest: None
IPython: 7.11.1
sphinx: None
 
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3686/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
539821504,MDExOlB1bGxSZXF1ZXN0MzU0NzMwNzI5,3642,Make datetime_to_numeric more robust to overflow errors,81219,closed,0,,,1,2019-12-18T17:34:41Z,2020-01-20T19:21:49Z,2020-01-20T19:21:49Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3642,"
 - [x] Closes #3641 
 - [x] Tests added
 - [x] Passes `black . && mypy . && flake8`
 - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
This is likely only safe with NumPy>=1.17 though. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3642/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
550964139,MDExOlB1bGxSZXF1ZXN0MzYzNzcyNzE3,3699,Feature/align in dot,10194086,closed,0,,,4,2020-01-16T17:55:38Z,2020-01-20T12:55:51Z,2020-01-20T12:09:27Z,MEMBER,,0,pydata/xarray/pulls/3699,"
 - [x] Closes #3694
 - [x] Tests added
 - [x] Passes `black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
Happy to get feedback @fujiisoup @shoyer ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3699/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
549679475,MDU6SXNzdWU1NDk2Nzk0NzU=,3694,"xr.dot requires equal indexes (join=""exact"")",10194086,closed,0,,,5,2020-01-14T16:28:15Z,2020-01-20T12:09:27Z,2020-01-20T12:09:27Z,MEMBER,,,,"#### MCVE Code Sample
```python
import xarray as xr
import numpy as np
d1 = xr.DataArray(np.arange(4), dims=[""a""], coords=dict(a=[0, 1, 2, 3]))
d2 = xr.DataArray(np.arange(4), dims=[""a""], coords=dict(a=[0, 1, 2, 3]))
# note: different coords
d3 = xr.DataArray(np.arange(4), dims=[""a""], coords=dict(a=[1, 2, 3, 4]))
(d1 * d2).sum() # -> array(14)
xr.dot(d1, d2) # -> array(14)
(d2 * d3).sum() # -> array(8)
xr.dot(d2, d3) # -> ValueError
```
#### Expected Output
```python
array(8)
```
#### Problem Description
The last statement results in an 
```python
ValueError: indexes along dimension 'a' are not equal
```
because `xr.apply_ufunc` defaults to `join='exact'`. However, I think this should work - 
but maybe there is a good reason for this to fail?
This is a problem for #2922 (weighted operations) - I think it is fine for the weights and data to not align. 
Fixing this may be as easy as specifying `join='inner'` in
https://github.com/pydata/xarray/blob/e0fd48052dbda34ee35d2491e4fe856495c9621b/xarray/core/computation.py#L1181-L1187
@fujiisoup
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: 5afc6f32b18f5dbb9a89e30f156b626b0a83597d
python: 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.12.14-lp151.28.36-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.2
xarray: 0.14.0+164.g5afc6f32.dirty
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.1.0
cfgrib: 0.9.7.2
iris: 2.2.0
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: installed
setuptools: 41.6.0.post20191029
pip: 19.3.1
conda: None
pytest: 5.2.2
IPython: 7.9.0
sphinx: None
 
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3694/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue