home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 329575874

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
329575874 MDU6SXNzdWUzMjk1NzU4NzQ= 2217 tolerance for alignment 31460695 open 0     23 2018-06-05T18:34:45Z 2021-07-08T17:42:52Z   NONE      

When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time.

Here is an example in which I create two 1d DataArrays which have slightly different coordinates:

```python import xarray as xr import numpy as np from glob import glob

tol=1e-14 x1 = np.arange(1,6)+ tol*np.random.rand(5) da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1})

x2 = np.arange(1,6) + tol*np.random.rand(5) da2 = da1.copy() da2['x'] = x2

print(da1.x,'\n', da2.x) <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 First I save both DataArrays as netcdf files and then use open_mfdataset to load them: da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}}) da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}})

db = xr.open_mfdataset(glob('da?.nc'))

db <xarray.Dataset> Dimensions: (x: 10) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ... Data variables: xarray_dataarray_variable (x) int64 dask.array<shape=(10,), chunksize=(5,)> So the x grid is now twice the size. This behavior is the same if I just use align with join='outer': xr.align(da1,da2,join='outer') (<xarray.DataArray (x: 10)> array([nan, 9., nan, 0., 2., nan, nan, 1., 0., nan]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0, <xarray.DataArray (x: 10)> array([ 9., nan, 0., nan, nan, 2., 1., nan, nan, 0.]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0) ```

Request/ suggestion

What is needed is a user specified tolerance level to give to open_mfdataset and passed to align which will accept these grids as the same

Possibly related to https://github.com/pydata/xarray/issues/2215

xr.__version__ '0.10.4'

thanks, Naomi

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2217/reactions",
    "total_count": 10,
    "+1": 10,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 23 rows from issue in issue_comments
Powered by Datasette · Queries took 0.629ms · About: xarray-datasette