home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1908161401

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1908161401 I_kwDOAMm_X85xvDt5 8225 Scalar coordinates should not be footloose 3383837 closed 0     5 2023-09-22T04:28:11Z 2023-09-25T16:10:40Z 2023-09-25T16:10:40Z CONTRIBUTOR      

Is your feature request related to a problem?

A scalar coordinate has the counter-intuitive property of being able to hop from one data variable to another. ``` import xarray as xr

a = xr.Dataset( data_vars={"a": (("x",), [0, 0])}, coords={ "x": [0.1, 2.3], "y": 42, }, ) b = xr.Dataset( data_vars={"b": ("x", [1, 1])}, coords={ "x": [0.1, 2.3], }, ) c = xr.merge((a, b)) Only `a` had the scalar coordinate `y` before merging, but now `c["b"]` has caught it: <xarray.DataArray 'b' (x: 2)> array([1, 1]) Coordinates: * x (x) float64 0.1 2.3 y int64 42 I think this is a bug in a way, because it does not reflect the NetCDF4 data model's ability to keep `y` as a coordinate on `a` alone. Note the "coordinates" attributes in the result of `c.to_netcdf`: netcdf c { dimensions: x = 2 ; variables: int64 a(x) ; a:coordinates = "y" ; double x(x) ; x:_FillValue = NaN ; int64 y ; int64 b(x) ; b:coordinates = "y" ; <---- Says who!? ```

Describe the solution you'd like

I would like each data variable in a dataset to keep track of its own scalar coordinates (as they can, of course and absolutely essentially, do for dimension coordinates). To continue the example above, I think c should have a representation that would lead to the following serialization: netcdf c { dimensions: x = 2 ; variables: int64 a(x) ; a:coordinates = "y" ; double x(x) ; x:_FillValue = NaN ; int64 y ; int64 b(x) ;

Describe alternatives you've considered

No response

Additional context

I think this feature could also help with #4501, wherein squeezing demotes a length-one non-dimensional coordinate to a scalar coordinate without tracking its own scalar coordinate. Egads, that's an ugly sentence. I'll elaborate over there.

Most importantly, this feature would solve a real problem I've encountered: model outputs, one for each combination of model parameters, that record parameters as a scalar coordinate only on the data variables the parameter affects. If you want to concatenate these together with XArray, you invariably get a lot of unncecessary data duplication. A contrived example with two outputs, in which the "temp" variable depends on parameter "time" but the "pressure" variable does not: output_42 = xr.Dataset({ "temp": xr.DataArray( data=[10, 9, 8], dims=("depth",), coords={ "depth": [0, 5, 10], "time": 42, }, ), "pressure": xr.DataArray( data=[0, 7, 14], coords={"depth": [0, 5, 10]}, ) }) output_88 = xr.Dataset({ "temp": xr.DataArray( data=[11, 10, 10], dims=("depth",), coords={ "depth": [0, 5, 10], "time": 88, }, ), "pressure": xr.DataArray( data=[0, 7, 14], coords={"depth": [0, 5, 10]}, ) }) I think it should be possible to concatenate these datasets without duplicating "pressure", like so: <xarray.Dataset> Dimensions: (depth: 3, time: 2) Coordinates: * depth (depth) int64 0 5 10 * time (time) int64 42 88 Data variables: temp (time, depth) int64 10 9 8 11 10 10 pressure (depth) int64 0 7 14 I can't get to there with any variation on xr.concat((output_42, output_88), dim="time", data_vars="minimal"), which I guess can be explained by the fact that "time" is associated with both "temp" and "pressure" in XArray's internal representation.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.669ms · About: xarray-datasette