home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where type = "issue" and user = 74414841 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue · 1 ✖

state 1

  • open 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
882105903 MDU6SXNzdWU4ODIxMDU5MDM= 5281 'Parallelized' apply_ufunc for scripy.interpolate.griddata LJaksic 74414841 open 0     4 2021-05-09T10:08:46Z 2022-04-09T01:39:13Z   NONE      

Hi,

I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity ux with dimensions (194988, 1009, 20) for respectively: 'nFlowElement' (name unstructered grid element), 'time' and laydim (depth dimension). I'd like to interpolate these results to a structured grid with dimensions (600, 560, 1009, 20)for respectively: latitude, longitude, time and laydim. For this I am using scipy.interpolate.griddata. As these dataarrays are too large to load into your working memory at once, I am trying to work with 'chunks' (dask). Unfortunately, I bump into problems when trying to use apply_ufunc with setting: dask = 'parallelized'.

For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result:

``` def interp_to_grid(u,xc,yc,xint,yint): print(u.shape,xc.shape,xint.shape) ug = griddata((xc,yc),u,(xint,yint), method='nearest', fill_value=np.nan) return ug

uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'allowed', input_core_dims=[['nFlowElem','time','laydim'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1','time','laydim']], output_dtypes = [xr.DataArray] ) `` Notice that in the function interp_to_grid the input variables have the following dimensions: -u(i.e. ux, the original flow velocity output): (194988, 1009, 20) for (nFlowElem, time, laydim) -xc,yc(the latitude and longitude coordinates associated with these 194988 elements) so both (194988,) -xint, yint(the structured grid coordinates to which I would like to interpolate the data): both are (600, 560) for (dim_0,dim_1) Notice that scipy.interpolate.griddata does not require me to loop over the time and laydim dimension (as formulated in the code above). For this it is criticial to feedgriddata` the dimensions in the right order ('time' and 'laydim' last). The interpolated result, uxg, has dimensions (600, 560, 1009, 20) - as wanted and expected.

However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions.

This is one of my "parallel" attempts (with chunks along the time dim):

Input ux: <xarray.DataArray 'ucx' (nFlowElem: 194988, time: 1009, laydim: 20)> dask.array<transpose, shape=(194988, 1009, 20), dtype=float64, chunksize=(194988, 10, 20), chunktype=numpy.ndarray> Coordinates: FlowElem_xcc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> FlowElem_ycc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> * time (time) datetime64[ns] 2014-09-17 ... 2014-10-01 Dimensions without coordinates: nFlowElem, laydim Attributes: standard_name: eastward_sea_water_velocity long_name: velocity on flow element center, x-component units: m s-1 grid_mapping: wgs84 Apply_func: uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'parallelized', input_core_dims=[['nFlowElem'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1']], output_dtypes = [xr.DataArray], ) Gives error: ``` File "interpnd.pyx", line 78, in scipy.interpolate.interpnd.NDInterpolatorBase.init

File "interpnd.pyx", line 192, in scipy.interpolate.interpnd._check_init_shape

ValueError: different number of values and points `` I have played around a lot with changing the core dimensions in apply_ufunc and the dimension along which to chunk. Also I have tried to manually change the order of dimensions of dataarrayuwhich is 'fed to' griddata (ininterp_to_grid`).

Any advice is very welcome! Best Wishes, Luka

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2479.134ms · About: xarray-datasette