home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

28 rows where issue = 340486433 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 9

  • fspaolo 8
  • shoyer 6
  • crusaderky 6
  • JiaweiZhuang 3
  • djhoese 1
  • kmuehlbauer 1
  • fujiisoup 1
  • tollers 1
  • lgramer 1

author_association 3

  • MEMBER 14
  • NONE 13
  • CONTRIBUTOR 1

issue 1

  • Does interp() work on curvilinear grids (2D coordinates) ? · 28 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
687336494 https://github.com/pydata/xarray/issues/2281#issuecomment-687336494 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDY4NzMzNjQ5NA== lgramer 49291260 2020-09-04T19:24:32Z 2020-09-04T19:24:32Z NONE

I find xarray so useful in many ways, for which I'm grateful. But there are some current limitations that force me to hesitate before recommending it to colleagues. One is this issue - lack of support (or rather, I suspect simply no "clearly stated support"?) for curvilinear coordinate systems, which are pretty much ubiquitous in the work I do. The other issue which causes me to pause before recommending xarray wholeheartedly is the complexity (and frequent slowness and errors, still - all previously reported) in dealing with GRIB2 file formats that include multiple vertical coordinate systems (e.g., products from the NCEP Unified Post Processing System - UPP). But that's an issue for another thread... Any movement on wrapping scipy griddata (or some suitably more sophisticated scipy tool) within xarray's interface?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
553920017 https://github.com/pydata/xarray/issues/2281#issuecomment-553920017 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDU1MzkyMDAxNw== tollers 42300091 2019-11-14T14:47:52Z 2019-11-14T14:47:52Z NONE

Have there been any updates on the handling of multi-dimensional co-ordinates in xarray, in particular interpolation/indexing using such co-ordinates, as discussed here? For geographical data with curvilinear grids (using latitudes and longitudes) this issue can become a real headache to deal with.

{
    "total_count": 12,
    "+1": 12,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
508078893 https://github.com/pydata/xarray/issues/2281#issuecomment-508078893 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDUwODA3ODg5Mw== djhoese 1828519 2019-07-03T12:49:20Z 2019-07-03T12:49:20Z CONTRIBUTOR

@kmuehlbauer Thanks for the ping. I don't have time to read this whole thread, but based on your comment I have a few things I'd like to point out. First, the pykdtree package is a good alternative to the scipy kdtree implementation. It has been shown to be much faster and uses openmp for parallel processing. Second, the pyresample library is my main way of resampled geolocated data. We use it in Satpy for resampling, but right now we haven't finalized the interfaces so things are kind of spread between satpy and pyresample as far as easy xarray handling. Pyresample uses SwathDefinition and AreaDefinition objects to define the geolocation of the data. In Satpy the same KDTree is used for every in-memory gridding, but we also allow a cache_dir which will save the indexes for every (source, target) area pair used in the resampling.

I'm hoping to sit down and get some geoxarray stuff implemented during SciPy next week, but usually get distracted by all the talks so no promises. I'd like geoxarray to provide a low level interface for getting and converting CRS and geolocation information on xarray objects and leave resampling and other tasks to libraries like pyresample and rioxarray.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
507975592 https://github.com/pydata/xarray/issues/2281#issuecomment-507975592 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDUwNzk3NTU5Mg== kmuehlbauer 5821660 2019-07-03T07:28:58Z 2019-07-03T07:28:58Z MEMBER

Thanks for this interesting discussion. I'm currently at the point where I'm moving interpolation functions to xarray based workflow. While trying to wrap my head around this I found that this involves not only interpolation but also indexing (see #1603, #2195, #2986). Sorry if this might exceed the original intention of the issue. But it is my real use case (curvelinear grids to cartesian).

citing @shoyer's comments for convenience

In particular, SciPy's griddata either makes use of a scipy.spatial.KDTree (for nearest neighbor lookups) and scipy.spatial.Delaunay(for linear interpolation, on a triangular mesh). We could build these data structures once (and potentially even cache them in indexes on xarray objects), and likewise calculate the sparse interpolation coefficients once for repeated use.

Anyways, as I've said above, I think it would be totally appropriate to build routines resembling scipy's griddata into interp() (but using the lower level KDTree/Delaunay interface). This will not be the most efficiency strategy, but should offer reasonable performance in most cases. Let's consider this open for contributions, if anyone is interested in putting together a pull request.

Yes, if we cache the Delaunay triangulation we could probably do the entire thing in about the time it currently takes to do one time step.

Our interpolators are build upon scipy's cKDTree. They are created once for some source and target grid configuration and then just called with the wanted data. The interpolator is cached in the dataset accessor for multiple use. But this makes only sense, if there are multiple variables within this dataset. I'm thinking about how to reuse the cached interpolator for other datasets with the same source and target configuration. Same would be true for tree-based indexers, if they become available in xarray.

My current approach would be to create an xarray dataset dsT with source dimension/coordinates (and target dimensions/coordinates) and the created tree. If the source has some projection attached one could give another projection target and the target dimensions/coordinates will be created accordingly (but this could be wrapped by other packages, like geoxarray, ping @djhoese). One could even precalculate target dists, idx from the tree for faster access (I do this). Finally there should be something like ds_res = ds_src.interp_like(dsT) where one can reuse this dataset.

I'm sure I can get something working within my workflow using accessors but it would be better fitted in xarray itself imho.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497420732 https://github.com/pydata/xarray/issues/2281#issuecomment-497420732 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQyMDczMg== fspaolo 539688 2019-05-30T17:50:32Z 2019-05-31T23:43:54Z NONE

@shoyer and @crusaderky That's right, that is how I was actually dealing with this problem prior trying xarray ... by flattening the grid coordinates and performing either gridding (with scipy's griddata) or interpolation (with scipy's map_coordinate) ... instead of performing proper regridding (from cube to cube without having to flatten anything).

As a rule of thumb, any fancy algorithm should first exist for numpy-only data and then potentially it can be wrapped by the xarray library.

This is important information.

For the record, here is so far what I found to be best performant:

``` import xarray as xr from scipy.interpolate import griddata

Here x/y are dummy 1D coords that wont be used.

da1 = xr.DataArray(cube1, [('t', t_cube1) , ('y', range(cube1.shape[1])), ('x', range(cube1.shape[2]))])

Regrid t_cube1 onto t_cube2 first since time will always map 1 to 1 between cubes.

This operation is very fast.

print('regridding in time ...') cube1 = da1.interp(t=t_cube2).values

Regrid each 2D field (X_cube1/Y_cube1 onto X_cube2/Y_cube2), one at a time

print('regridding in space ...') cube3 = np.full_like(cube2, np.nan) for k in range(t_cube2.shape[0]): print('regridding:', k) cube3[:,:,k] = griddata((X_cube1.ravel(), Y_cube1.ravel()), cube1[k,:,:].ravel(), (X_cube2, Y_cube2), fill_value=np.nan, method='linear') ```

Performance is not that bad... for ~150 time steps and ~1500 nodes in x and y it takes less than 10 min.

I think this can be sped up by computing the interpolation weights between grids in the first iteration and cache them (I think xESMF does this).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497629988 https://github.com/pydata/xarray/issues/2281#issuecomment-497629988 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzYyOTk4OA== shoyer 1217238 2019-05-31T08:46:38Z 2019-05-31T08:46:38Z MEMBER

Yes, if we cache the Delaunay triangulation we could probably do the entire thing in about the time it currently takes to do one time step.

On Thu, May 30, 2019 at 10:50 AM Fernando Paolo notifications@github.com wrote:

@shoyer https://github.com/shoyer and @crusaderky https://github.com/crusaderky That's right, that is how I was actually dealing with this problem prior trying xarray ... by flattening the grid coordinates and performing either gridding (with scipy's griddata) or interpolation (with scipy's map_coordinate) ... instead of performing proper regridding (from cube to cube without having to flatten anything).

As a rule of thumb, any fancy algorithm should first exist for numpy-only data and then potentially it can be wrapped by the xarray library.

This is important information.

For the record, here is so far what I found to be best performant:

import xarray as xr from scipy.interpolate import griddata

Here x/y are dummy 1D coords that wont be used.

da1 = xr.DataArray(cube1, [('t', t_cube1) , ('y', range(cube1.shape[1])), ('x', range(cube1.shape[2]))])

Regrid t_cube1 onto t_cube2 first since time will always map 1 to 1 between cubes.

This operation is very fast.

print('regridding in time ...') cube1 = da1.interp(t=t_cube2).values

Regrid each 2D field (X_cube1/Y_cube1 onto X_cube2/Y_cube2) one at a time

print('regridding in space ...') cube3 = np.full_like(cube2, np.nan) for k in range(t_cube1.shape[0]): print('regridding:', k) cube3[:,:,k] = griddata((X_cube1.ravel(), Y_cube1.ravel()), cube1[k,:,:].ravel(), (X_cube2, Y_cube2), fill_value=np.nan, method='linear')

Performance is not that bad... for ~150 time steps and ~1500 nodes in x and y it takes less than 10-15 min.

I think this can be sped up by computing and saving the interpolation weights between grids in the first iteration and cache them (I think xESMF does this).

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2281?email_source=notifications&email_token=AAJJFVQNLZ3SUTY2WIMI3E3PYAHWVA5CNFSM4FJQZDP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWTATPA#issuecomment-497420732, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVV5OXUPV25D64WJEZTPYAHWVANCNFSM4FJQZDPQ .

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497475534 https://github.com/pydata/xarray/issues/2281#issuecomment-497475534 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQ3NTUzNA== fspaolo 539688 2019-05-30T20:32:24Z 2019-05-30T20:58:23Z NONE

@fspaolo where does that huge number come from? I thought you said you have 1500 nodes in total.

Not in total, I meant ~1500 on each x/y dimension (1500 x 1500). To be precise:

``` print(cube1.shape) # (t, y, x) (306, 240, 262)

print(cube2.shape) # (y, x, t) (1452, 1836, 104) ```

Did you select a single point on the t dimension before you applied bisplrep?

Yes.

Also, (pardon the ignorance, I never dealt with geographical data) what kind of information does having your lat and lon being bidimensional convey? Does it imply lat[i, j] < lat[i +1, j] and lon[i, j] < lon[i, j+1] for any possible (i, j)?

It does in my case, but cube1 is a rotated grid, meaning that lat is not the same along the x-axis neither lon is the same along the y-axis, while cube2 is a standard lon/lat grid so I can represent it simply by a 1D lon array (x-axis) and a 1D lat array (y-axis). To have them both with 1D coordinates I would have to either rotate cube2 and work with 1D rotated lon/lat, or unrotate cube1 and work with 1D standard lon/lat... but this gets me to the same problem as in the interpolation above since I have to transform (re-project) every 2D grid in the cube.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497473031 https://github.com/pydata/xarray/issues/2281#issuecomment-497473031 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQ3MzAzMQ== shoyer 1217238 2019-05-30T20:24:56Z 2019-05-30T20:24:56Z MEMBER

@fspaolo where does that huge number come from? I thought you said you have 1500 nodes in total. Did you select a single point on the t dimension before you applied bisplrep?

2665872 is roughly 1600^2.

Also, (pardon the ignorance, I never dealt with geographical data) what kind of information does having your lat and lon being bidimensional convey? Does it imply lat[i, j] < lat[i +1, j] and lon[i, j] < lon[i, j+1] for any possible (i, j)?

I think this is true sometimes but not always. The details depend on the geographic projection, but generally a good mesh has some notion of locality -- nearby locations in real space (i.e., on the globe) should also nearby in projected space.


Anyways, as I've said above, I think it would be totally appropriate to build routines resembling scipy's griddata into interp() (but using the lower level KDTree/Delaunay interface). This will not be the most efficiency strategy, but should offer reasonable performance in most cases. Let's consider this open for contributions, if anyone is interested in putting together a pull request.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497468930 https://github.com/pydata/xarray/issues/2281#issuecomment-497468930 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQ2ODkzMA== crusaderky 6213168 2019-05-30T20:12:29Z 2019-05-30T20:12:29Z MEMBER

@fspaolo where does that huge number come from? I thought you said you have 1500 nodes in total. Did you select a single point on the t dimension before you applied bisplrep?

Also, (pardon the ignorance, I never dealt with geographical data) what kind of information does having your lat and lon being bidimensional convey? Does it imply lat[i, j] < lat[i +1, j] and lon[i, j] < lon[i, j+1] for any possible (i, j)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497458053 https://github.com/pydata/xarray/issues/2281#issuecomment-497458053 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQ1ODA1Mw== shoyer 1217238 2019-05-30T19:38:43Z 2019-05-30T19:38:43Z MEMBER

The naive implementation of splines involves inverting an N x N matrix where N is the total number of grid points. So it definitely is not a very scalable technique.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497448335 https://github.com/pydata/xarray/issues/2281#issuecomment-497448335 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzQ0ODMzNQ== fspaolo 539688 2019-05-30T19:07:54Z 2019-05-30T19:08:47Z NONE

@crusaderky I also did the above using bisplrep and bispev, and it seems it cannot handle a grid of this size:

``` print(len(x), len(y), len(z)) 62880 62880 62880

print(len(x_new), len(y_new)) 2665872 2665872

Traceback (most recent call last): File "cubesmb.py", line 201, in <module> z_new = bisplev(x_new, y_new, tck) File "/Users/paolofer/anaconda3/lib/python3.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 1047, in bisplev z, ier = _fitpack._bispev(tx, ty, c, kx, ky, x, y, dx, dy) RuntimeError: Cannot produce output of size 2665872x2665872 (size too large) ```

So griddata looks like my best option...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497254984 https://github.com/pydata/xarray/issues/2281#issuecomment-497254984 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzI1NDk4NA== crusaderky 6213168 2019-05-30T08:45:16Z 2019-05-30T08:50:13Z MEMBER

I did not test it but this looks like what you want: ``` from scipy.interpolate import bisplrep, bisplev

x = cube1.x.values.ravel() y = cube1.y.values.ravel() z = cube1.values.ravel() x_new = cube2.x.values.ravel() y_new = cube2.y.values.ravel() tck = bisplrep(x, y, z) z_new = bisplev(x_new, y_new, tck) z_new = z_new.reshape(cube2.shape) cube3 = xarray.DataArray(z_new, dims=cube2.dims, coords=cube2.coords) ``` I read above that you have concerns about performance as the above does not understand the geometry of the input data - did you run performance tests on it already?

[EDIT] you will probably need to break down your problem on 1-point slices along dimension t before you apply the above.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497251626 https://github.com/pydata/xarray/issues/2281#issuecomment-497251626 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzI1MTYyNg== crusaderky 6213168 2019-05-30T08:33:16Z 2019-05-30T08:33:51Z MEMBER

@fspaolo sorry, I should have taken more time re-reading the initial post. No, xarray_extras.interpolate does not do the kind of interpolation you want. Have you looked into scipy?

https://docs.scipy.org/doc/scipy/reference/interpolate.html#multivariate-interpolation

xarray is just a wrapper, and if scipy does what you need, it's trivial to unwrap your DataArray into a bunch of numpy arrays, feed them into scipy, and then re-wrap the output numpy arrays into a DataArray. On the other hand, if scipy does not do what you want, then I suspect that opening a feature request on the scipy tracker would be a much better place than the xarray board. As a rule of thumb, any fancy algorithm should first exist for numpy-only data and then potentially it can be wrapped by the xarray library.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497155229 https://github.com/pydata/xarray/issues/2281#issuecomment-497155229 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzE1NTIyOQ== JiaweiZhuang 25473287 2019-05-30T00:24:36Z 2019-05-30T00:33:49Z NONE

An MPI error?!

@fspaolo Could you post a minimal reproducible example on xESMF's issue tracker? Just to keep this issue clean. The error looks like an ESMF installation problem that can happen on legacy OS, and it can be easily fixed by Docker or other containers.

It is surprising that a package targeting n-dimensional gridded datasets (particularly those from the geo/climate sciences) does not handle such a common task with spatial gridded data.

Just a side comment: This is a common but highly non-trivial task... Even small edges cases like periodic longitudes and polar singularities can cause interesting troubles. Otherwise I would just code up an algorithm in Xarray from scratch instead of relying on a heavy Fortran library. But things will get improved over time...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497138085 https://github.com/pydata/xarray/issues/2281#issuecomment-497138085 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzEzODA4NQ== fspaolo 539688 2019-05-29T22:56:07Z 2019-05-30T00:14:09Z NONE

@crusaderky what I'm trying to do is what the title of this opened thread says "Does interp() work on curvilinear grids (2D coordinates)?" Working with (x, y, t) all 1D numpy arrays is an easy problem! But that's not the problem I'm dealing with.

Let me state exactly what my problem is. I have two data cubes, cube1 and cube2:

``` cube1 = xr.DataArray(cube1, dims=['t', 'y', 'x'], coords={'time': ('t', t_cube1), # 1D array 'lon': (['y', 'x'], X_cube1), # 2D array!!! 'lat': (['y', 'x'], Y_cube1)}) # 2D array!!!

cube2 = xr.DataArray(cube2, dims=['y', 'x', 't'], coords={'time': ('t', t_cube2), # 1D array 'lon': (['y', 'x'], X_cube2), # 2D array!!! 'lat': (['y', 'x'], Y_cube2)}) # 2D array!!! ```

All I want is to regrid cube1 onto cube2, or in other words, interpolate X_cube2/Y_cube2/t_cube2 onto cube1. Note that I cannot pass X_cube1/Y_cube1 as 1D arrays because they vary for every grid cell (this is a rotated grid). I can, however, pass X_cube2/Y_cube2 as 1D arrays (this is a standard lon/lat grid).

Also note the inverted time dimension between cube1 and cube2 (but that shouldn't be an issue).

So how to perform this operation... or am I missing something?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497150401 https://github.com/pydata/xarray/issues/2281#issuecomment-497150401 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzE1MDQwMQ== shoyer 1217238 2019-05-29T23:58:42Z 2019-05-29T23:58:42Z MEMBER

So how to perform this operation... or am I missing something?

Sorry, i don't think there's an easy way to do this directly in xarray right now.

My concern with scipy.interpolate.griddata is that the performance might be miserable... griddata takes an arbitrary stream of data points in a D-dimensional space. It doesn't know if those source data points have a gridded/mesh structure. A curvilinear grid mesh needs to be flatten into a stream of points before passed to griddata(). Might not be too bad for nearest-neighbour search, but very inefficient for linear/bilinear method, where knowing the mesh structure beforehand can save a lot of computation.

Thinking a little more about this, I wonder if this the performance could actually be OK as long as the spatial grid is not too big, i.e., if we reuse the same grid many times for different variables/times.

In particular, SciPy's griddata either makes use of a scipy.spatial.KDTree (for nearest neighbor lookups) and scipy.spatial.Delaunay(for linear interpolation, on a triangular mesh). We could build these data structures once (and potentially even cache them in indexes on xarray objects), and likewise calculate the sparse interpolation coefficients once for repeated use.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497130177 https://github.com/pydata/xarray/issues/2281#issuecomment-497130177 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzEzMDE3Nw== crusaderky 6213168 2019-05-29T22:22:01Z 2019-05-29T22:25:45Z MEMBER

@fspaolo 2d mesh interpolation and 1d interpolation with extra "free" dimensions are fundamentally different algorithms. Look up the scipy documentation on the various interpolation functions available.

I don't understand what you are trying to pass for x_new and y_new and it definitely doesn't sound right. Right now you have a 3d DataArray with dimensions (x, y, t) and 3 coords, each of which is a 1d numpy array (e.g. da.coords.x.values). If you want to rescale, you need to pass a 1d numpy array or array-like for x_new, and another separate 1d array for y_new. You are not doing that, as the error message you're receiving is saying that your x_new is a numpy array with 2 or more dimensions, which the algorithm doesn't know what to do with. It can accept a multi-dimensional DataArrays with brand new dimensions, but that does not sound like it's your case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497129090 https://github.com/pydata/xarray/issues/2281#issuecomment-497129090 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzEyOTA5MA== fspaolo 539688 2019-05-29T22:17:39Z 2019-05-29T22:17:39Z NONE

@JiaweiZhuang your approach doesn't work either. After installing you package and the dependencies... and following the documentation, I got

Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(474)..............: MPID_Init(190).....................: channel initialization failed MPIDI_CH3_Init(89).................: MPID_nem_init(320).................: MPID_nem_tcp_init(173).............: MPID_nem_tcp_get_business_card(420): MPID_nem_tcp_init(379).............: gethostbyname failed, LMC-061769 (errno 1) [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=3191311 : system msg for write_line failure : Bad file descriptor

An MPI error?!

What bothers me are statements like this "For usability and simplicity"...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
497127039 https://github.com/pydata/xarray/issues/2281#issuecomment-497127039 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NzEyNzAzOQ== fspaolo 539688 2019-05-29T22:09:17Z 2019-05-29T22:16:38Z NONE

@crusaderky It doesn't work. First, it tells me that if x_new/y_new are 2D, then I have to pass them as DataArray's... why do I have to do that? After doing that, it tells me there are conflicts between "overlapping" dimensions?! Ok, I then pass x_new/y_new as 1D arrays... and the result comes nonsense!

The only interpolation that works (both with DataArray.interp() and your splev()) is the one over the time dimension. This is because time is defined as a one-dimensional variable (t).

Why is it so hard to perform an interpolation with spatial coordinates defined with 2D variables?! I would think this is a pretty common operation on climate datasets...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
495871201 https://github.com/pydata/xarray/issues/2281#issuecomment-495871201 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NTg3MTIwMQ== crusaderky 6213168 2019-05-25T06:55:33Z 2019-05-25T06:59:03Z MEMBER

@fspaolo I never tried using my algorithm to perform 2D interpolation, but this should work: ``` from xarray_extras.interpolate import splrep, splev

da = splev(x_new, splrep(da, 'x')) da = splev(y_new, splrep(da, 'y')) da = splev(t_new, splrep(da, 't')) ``` Add k=1 to downgrade from cubic to linear interpolation and get a speed boost.

You can play around with dask to increase performance by using all your CPUs (or more with dask distributed), although you have to remember that an original dim can't be broken on multiple chunks when you apply splrep to it:

from xarray_extras.interpolate import splrep, splev da = da.chunk(t=TCHUNK) da = splev(x_new, splrep(da, 'x')) da = splev(y_new, splrep(da, 'y')) da = da.chunk(x=SCHUNK, y=SCHUNK).chunk(t=-1) da = splev(t_new, splrep(da, 't')) da = da.compute() where TCHUNK and SCHUNK are integers you'll have to play with. The rule of thumb is that you want to have your chunks 5~100 MBs each.

If you end up finding out that chunking along an interpolation dimension is important for you, it is possible to implement it with dask ghosting techniques, just painfully complicated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
495790793 https://github.com/pydata/xarray/issues/2281#issuecomment-495790793 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NTc5MDc5Mw== fspaolo 539688 2019-05-24T21:18:25Z 2019-05-24T21:20:57Z NONE

@crusaderky I don't think we need a "proper" 3d interpolation in most cases (i.e. predicting each 3d grid node considering all dimensions simultaneously). If you see my example above, DataArray.interp(x=x_new, y=y_new).interp(t=t_new), I am first interpolating over the spatial coordinates and then over time. If instead I do DataArray.interp(x=x_new, y=y_new, t=t_new), the computing time is prohibited for large ndarrays. In fact, I think DataArray.interp() should figure this out and do this decomposition internally when calling DataArray.interp(x=x_new, y=y_new, t=t_new).

The main limitation here, however, is being able to interpolate over the spatial coordinates when these are defined as 2d arrays. I'll check your package... thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
495515463 https://github.com/pydata/xarray/issues/2281#issuecomment-495515463 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NTUxNTQ2Mw== crusaderky 6213168 2019-05-24T08:10:10Z 2019-05-24T08:10:10Z MEMBER

I am not aware of a ND mesh interpolation algorithm. However, my package xarray_extras [1] offers highly optimized 1D interpolation on a ND hypercube, on any numerical coord (not just time). You may try applying it 3 times on each dimension in sequence and see if you get what you want - although performance won't be optimal.

[1] https://xarray-extras.readthedocs.io/en/latest/

Alternatively, if you do find the exact algorithm you want, but it's for numpy, then applying it to xarray is simple - just get DataArray.values -> apply function -> create new DataArray from the output.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
495421770 https://github.com/pydata/xarray/issues/2281#issuecomment-495421770 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQ5NTQyMTc3MA== fspaolo 539688 2019-05-23T23:37:45Z 2019-05-23T23:53:33Z NONE

Great thread by @JiaweiZhuang! I just posted a question on stackoverflow about this exact problem. After hours navigating through the xarray documentation I finally found this opened issue... (since I'm new to xarray I thought the problem was my lack of understanding on how the package works)

It is surprising that a package targeting n-dimensional gridded datasets (particularly those from the geo/climate sciences) does not handle such a common task with spatial gridded data.

The problem on hand is this: I have two 3d arrays with different dimensions defined by 2d coordinates, all I want is to regrid the first cube onto the second. Is there a way to perform this operation with xarray?

This is what I've tried (which @JiaweiZhuang explained why it doesn't work):

``` da = xr.DataArray(cube, dims=['t', 'y', 'x'], coords={'t': time,
'xc': (['y', 'x'], X),
'yc': (['y', 'x'], Y)})

da_interp = da.interp(x=x_new, y=y_new).interp(t=t_new) ```

Here x_new/y_new should be mapped onto xc/yc (physical coordinates), not onto x/y (logical coordinates).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
404685906 https://github.com/pydata/xarray/issues/2281#issuecomment-404685906 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQwNDY4NTkwNg== JiaweiZhuang 25473287 2018-07-12T23:58:48Z 2018-07-13T18:24:02Z NONE

Do you have any proposal?

I guess it is not an API design problem yet... The algorithm is not here since interpn doesn't deal with curvilinear grids.

I think we could make dr.interp(xc=lon, yc=lat) work for the N-D -> M-D case by wrapping scipy.interpolate.griddata

My concern with scipy.interpolate.griddata is that the performance might be miserable... griddata takes an arbitrary stream of data points in a D-dimensional space. It doesn't know if those source data points have a gridded/mesh structure. A curvilinear grid mesh needs to be flatten into a stream of points before passed to griddata(). Might not be too bad for nearest-neighbour search, but very inefficient for linear/bilinear method, where knowing the mesh structure beforehand can save a lot of computation.

Utilizingscipy.interpolate.griddata would be a nice feature, but it should probably be used for data point streams (more like a Pandas dataframe method?), not as a way to handle curvilinear grids.

PS: I have some broader concerns regarding interp vs xESMF: JiaweiZhuang/xESMF#24

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
404744836 https://github.com/pydata/xarray/issues/2281#issuecomment-404744836 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQwNDc0NDgzNg== shoyer 1217238 2018-07-13T07:00:16Z 2018-07-13T07:00:16Z MEMBER

I'd like to figure out interfaces that make it possible for external, grid aware libraries to extend indexing and interpolation features in xarray. In particular, it would be nice to be able to associate a "grid index" used for caching computation that gets passed on in all xarray operations.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
404387651 https://github.com/pydata/xarray/issues/2281#issuecomment-404387651 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQwNDM4NzY1MQ== JiaweiZhuang 25473287 2018-07-12T04:43:09Z 2018-07-13T00:11:44Z NONE

One way I can think of to make interp() work on this example: Define a new coordinate system (i.e. two new coordinate variables) on the source curvilinear grid, and rewrite the destination coordinate using those new coordinate variables (not lat, lon anymore).

But this is absolutely too convoluted...

Updated: see Gridded with Scipy for a similar idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
404611922 https://github.com/pydata/xarray/issues/2281#issuecomment-404611922 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQwNDYxMTkyMg== shoyer 1217238 2018-07-12T18:45:35Z 2018-07-12T18:45:35Z MEMBER

I think we could make dr.interp(xc=lon, yc=lat) work for the N-D -> M-D case by wrapping scipy.interpolate.griddata

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
404407676 https://github.com/pydata/xarray/issues/2281#issuecomment-404407676 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDQwNDQwNzY3Ng== fujiisoup 6815844 2018-07-12T06:48:18Z 2018-07-12T06:48:18Z MEMBER

Thanks, @JiaweiZhuang

Not yet. interp() only works on N-dimensional regular grid. Under the hood, we are just using scipy.interpolate.interp1d and interpn.

I am happy to see curvilinear interpolation in xarray if we could find a good general API for N-dimensional array. Do you have any proposal?

For curvilinear interpolation, we may have some arbitrariness, e.g. python dr_out = dr.interp(xc=lon) the resultant dimension is not well determined. Maybe we need some limitation for the arguments.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.312ms · About: xarray-datasette