home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 882105903

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
882105903 MDU6SXNzdWU4ODIxMDU5MDM= 5281 'Parallelized' apply_ufunc for scripy.interpolate.griddata 74414841 open 0     4 2021-05-09T10:08:46Z 2022-04-09T01:39:13Z   NONE      

Hi,

I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity ux with dimensions (194988, 1009, 20) for respectively: 'nFlowElement' (name unstructered grid element), 'time' and laydim (depth dimension). I'd like to interpolate these results to a structured grid with dimensions (600, 560, 1009, 20)for respectively: latitude, longitude, time and laydim. For this I am using scipy.interpolate.griddata. As these dataarrays are too large to load into your working memory at once, I am trying to work with 'chunks' (dask). Unfortunately, I bump into problems when trying to use apply_ufunc with setting: dask = 'parallelized'.

For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result:

``` def interp_to_grid(u,xc,yc,xint,yint): print(u.shape,xc.shape,xint.shape) ug = griddata((xc,yc),u,(xint,yint), method='nearest', fill_value=np.nan) return ug

uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'allowed', input_core_dims=[['nFlowElem','time','laydim'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1','time','laydim']], output_dtypes = [xr.DataArray] ) `` Notice that in the function interp_to_grid the input variables have the following dimensions: -u(i.e. ux, the original flow velocity output): (194988, 1009, 20) for (nFlowElem, time, laydim) -xc,yc(the latitude and longitude coordinates associated with these 194988 elements) so both (194988,) -xint, yint(the structured grid coordinates to which I would like to interpolate the data): both are (600, 560) for (dim_0,dim_1) Notice that scipy.interpolate.griddata does not require me to loop over the time and laydim dimension (as formulated in the code above). For this it is criticial to feedgriddata` the dimensions in the right order ('time' and 'laydim' last). The interpolated result, uxg, has dimensions (600, 560, 1009, 20) - as wanted and expected.

However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions.

This is one of my "parallel" attempts (with chunks along the time dim):

Input ux: <xarray.DataArray 'ucx' (nFlowElem: 194988, time: 1009, laydim: 20)> dask.array<transpose, shape=(194988, 1009, 20), dtype=float64, chunksize=(194988, 10, 20), chunktype=numpy.ndarray> Coordinates: FlowElem_xcc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> FlowElem_ycc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> * time (time) datetime64[ns] 2014-09-17 ... 2014-10-01 Dimensions without coordinates: nFlowElem, laydim Attributes: standard_name: eastward_sea_water_velocity long_name: velocity on flow element center, x-component units: m s-1 grid_mapping: wgs84 Apply_func: uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'parallelized', input_core_dims=[['nFlowElem'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1']], output_dtypes = [xr.DataArray], ) Gives error: ``` File "interpnd.pyx", line 78, in scipy.interpolate.interpnd.NDInterpolatorBase.init

File "interpnd.pyx", line 192, in scipy.interpolate.interpnd._check_init_shape

ValueError: different number of values and points `` I have played around a lot with changing the core dimensions in apply_ufunc and the dimension along which to chunk. Also I have tried to manually change the order of dimensions of dataarrayuwhich is 'fed to' griddata (ininterp_to_grid`).

Any advice is very welcome! Best Wishes, Luka

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 1.006ms · About: xarray-datasette