home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1552701403

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1552701403 I_kwDOAMm_X85cjFfb 7468 Provide default APIs and functions for getting variable at a given location, based on some criteria / extrema conditions on other variables 8382834 open 0     0 2023-01-23T08:35:43Z 2023-01-23T08:35:43Z   CONTRIBUTOR      

Is your feature request related to a problem?

No, this is related to a need that comes regularly when working with netCDF files in geosciences.

Describe the solution you'd like

what is needed

There are many cases with netcdf files when one wants to find some location, or get variable(s) at some location, where the location is determined by a condition on some variables. A classical example, around which there are many stack overflow questions, online discussions, suggested "hacky" solution, snippets etc, available, is something like the following. Given a file that looks like this:

dimensions: nj = 949 ; ni = 739 ; nc = 5 ; time = UNLIMITED ; // (24 currently) variables: float TLAT(nj, ni) ; TLAT:long_name = "T grid center latitude" ; TLAT:units = "degrees_north" ; TLAT:missing_value = 1.e+30f ; TLAT:_FillValue = 1.e+30f ; float TLON(nj, ni) ; TLON:long_name = "T grid center longitude" ; TLON:units = "degrees_east" ; TLON:missing_value = 1.e+30f ; TLON:_FillValue = 1.e+30f ; float Tair_h(time, nj, ni) ; Tair_h:units = "C" ; Tair_h:long_name = "air temperature" ; Tair_h:coordinates = "TLON TLAT time" ; Tair_h:cell_measures = "area: tarea" ; Tair_h:missing_value = 1.e+30f ; Tair_h:_FillValue = 1.e+30f ; Tair_h:time_rep = "instantaneous" ;

answer a question like:

  • find the mesh point (ni, nj) closest to the location (TLAT=latval, TLON=lonval)?
  • give the nearest / interpolated value of Tair_h at latitude and longitude (latval, lonval)
  • do the same as above for lists / arrays of coordinates.

I do not think there is a recommended, standard, simple / one liner to do this with xarray in general (in particular if the (latval, lonval) falls out of the discrete set of mesh nodes). This means that a there are plenty of ad hoc hacked solutions getting shared around to solve this. Having a default recommended way would likely help users quite a bit and save quite some work.

the existing ways to solve the need

As soon as the TLAT and TLON are not "aligned" with the ni and nj coordinates (if they exactly match a mesh point, then likely some .where(TLAT=latval, TLON=lonval) can do), this is a bit of work. One has typically to:

  • build the 2D (dependent on (ni, nj) ) field representing the function (ni, nj) -> distance(node(ni, nj), point(latval, lonval) )
  • find the smallest value on this field to get the nearest coordinate and the value there, or the few smallest values and use some interpolation to interpolate

There are many more examples of questions that revolve around this kind of "query", and the answers are usually ad-hoc, though a lot of the logics repeat themselves, which make me believe a general high quality / standard solution would be useful:

  • https://stackoverflow.com/questions/58758480/xarray-select-nearest-lat-lon-with-multi-dimension-coordinates
  • https://gis.stackexchange.com/questions/357026/indexing-coordinates-in-order-to-get-a-value-by-coordinates-from-xarray (but what in the case where the point looked for "falls between" mesh nodes?)

Also note that most of these answers use simple / relatively naive / inefficient algorithms, but I wonder if there are some examples of code that could be used to build this in an efficient way, see the discussions in:

  • https://github.com/xarray-contrib/xoak
  • https://stackoverflow.com/questions/10818546/finding-index-of-nearest-point-in-numpy-arrays-of-x-and-y-coordinates
  • https://stackoverflow.com/questions/2566412/find-nearest-value-in-numpy-array

It looks like there are some snippets available that can be use to do this more or less exactly, when the netcdf file follows some conventions:

  • https://gist.github.com/blaylockbk/0ac5427b09fbae8d367a691ff90cdb4e

It looks like there is no dedicated / recommended / default xarray solution to do this though. It would be great if xarray could offer a (set of) well tested, well implemented, efficient way(s) to solve this kind of needs. I guess this is such a common need that providing a default solution with a default API, even if it is not optimal for all use cases, would be better than providing nothing at all and have users hack their own helper functions.

what xarray could implement

It would be great if xarray could offer support for this built in. A few thoughts of how this could be done:

  • calculate function on all array based on specification
  • find closest / interpolation way
  • provide a few default "assemblies" of these functions to support common file kinds
  • provide some ways to check that the request is reasonable / well formulated (for example, some functions in the kind of check_that_convex, that would check that taking a minimum is more or less reasonable).

I wonder if thinking about a few APIs and agreeing on these would be helpful before implementing anything. Just for the sake of brainstorming, maybe some functions with this kind of "API pseudocode" on datasets could make sense / would be a nice standardization to offer to users? Any thoughts / ideas of better solution?

python def compute_function_on_mesh(self, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> numpy_2d_array: """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node."""

python def find_node_with_lowest_value(self, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> Tuple(dim1, ..., dimn): """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node, and return the node coordinates that minimize the function."""

python def get_variable_at_node_with_lowest_value(self, variable_to_use, function_to_compute_on_nodes(arg1, ..., argn), list_args_to_use_in_funcion[var1, ..., varn]) -> float: """compute function_to_compute_on_nodes at each "node point" of the dataset, using as arguments to the function the value from var1, ..., varn at each corresponding node, and return the variable_to_use value at the node coordinates that minimize the function."""

(note: for this last function, consider also providing a variant that performs interpolation outside of mesh points?)

Maybe providing a few specializations for working with finding specific points in space would be useful? Like:

python def get_variable_at_closest_location(self, variable_to_use, variable_lat, variable_lon, latvalue, lonvalue) -> float: """get variable_to_use at the mesh point closest to (latvalue, lonvalue), using the variables variable_lat, variable_lon as the lat and lon axis."""

Describe alternatives you've considered

Writing my own small function, or re-using some snippet circulating on internet.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7468/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.702ms · About: xarray-datasette