home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 1154014066 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • tovogt 5
  • DWesl 3
  • dcherian 1

author_association 2

  • CONTRIBUTOR 8
  • MEMBER 1

issue 1

  • Only auxiliary coordinates are listed in nc variable attribute · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1069096876 https://github.com/pydata/xarray/issues/6310#issuecomment-1069096876 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uR-s tovogt 57705593 2022-03-16T12:55:28Z 2022-03-16T12:55:28Z CONTRIBUTOR

Yes, I tested this locally, and here is the PR for that suggestion: #6366

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069092987 https://github.com/pydata/xarray/issues/6310#issuecomment-1069092987 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uRB7 DWesl 22566757 2022-03-16T12:50:50Z 2022-03-16T12:50:50Z CONTRIBUTOR

That could work. Are you set up to check that? That can be either a full repository checkout or an XArray installation you can edit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069089232 https://github.com/pydata/xarray/issues/6310#issuecomment-1069089232 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uQHQ tovogt 57705593 2022-03-16T12:46:33Z 2022-03-16T12:46:33Z CONTRIBUTOR

No, in this case "coordinates" in encoding will resolve to False, so that's not the problematic step.

I think the problem is that the author of line 773 assumed that pop_to would return the value of attrs['coordinates'] if encoding['coordinates'] is not set. That is wrong and the line should be fixed to look like this instead: python coords_str = pop_to(encoding, attrs, "coordinates") or attrs.get("coordinates")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069084130 https://github.com/pydata/xarray/issues/6310#issuecomment-1069084130 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uO3i DWesl 22566757 2022-03-16T12:40:20Z 2022-03-16T12:40:20Z CONTRIBUTOR

Given this: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L782-L783 I think that should be working. This: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L770-L779 explicitly says it should, and is probably the part where things go wrong, but it should be going wrong the same way for encoding and attrs.

I think https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L758-L768 may need to be split into two conditionals, one for attrs and one for encoding. I'm not sure how to get the continue behavior while allowing the code to work for both attrs and encoding without code duplication.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069073130 https://github.com/pydata/xarray/issues/6310#issuecomment-1069073130 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uMLq tovogt 57705593 2022-03-16T12:27:14Z 2022-03-16T12:27:14Z CONTRIBUTOR

@DWesl Thanks for digging into the details of the CF! I read your post in the sense that solution (1) is the one to choose.

I have one more question though before we close this:

When setting "coordinates" as a variable attribute instead of as an encoding, the value of that variable attribute is overwritten when writing to a NetCDF file, see this example: python import xarray as xr ds = xr.Dataset( {"values": ('time', [0.0, 0.1])}, coords={ 'time': ('time', [0, 1]), 'lat': ('time', [5, 4]), 'lon': ('time', [10, 12]) }) ds['values'].attrs['coordinates'] = "time lon lat" ds.to_netcdf("test.nc") shell $ ncdump -h test.nc netcdf test { dimensions: time = 2 ; variables: double values(time) ; values:_FillValue = NaN ; values:coordinates = "lon lat" ; int64 time(time) ; int64 lat(time) ; int64 lon(time) ; } Is this intended, and if so, what is the reason?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069064616 https://github.com/pydata/xarray/issues/6310#issuecomment-1069064616 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uKGo DWesl 22566757 2022-03-16T12:17:37Z 2022-03-16T12:17:37Z CONTRIBUTOR

I tried to find what the CF conventions say about including dimension coordinates (I'm using the name from scitools-iris rather than "coordinate variable" as used in the CF conventions to keep myself from getting confused) in the coordinates attribute. From what I can tell, the whole document is consistent with usually excluding dimension coordinates from the coordinates attribute. Most of the Discrete Sampling Geometry examples in appendix H seem to include the dimension coordinates in the coordinates attributes, though at least one example leaves the dimension coordinates implied rather than explicit.

From what I remember, XArray is based on the netCDF data model, rather than the CF data model, so initializing variable_coordinates[var_name] = set(variable.dims) will do the wrong thing if the dataset doesn't set one or more of its dimension coordinates (example H.2 has variables with dimensions ("station", "time"), but no variable named station. Section 4.5 makes this practice explicit). You could work around this by leaving the initialization as it stands but dropping the if coordinate_name not in variable.dims condition on including coordinate_name as part of the coordinates attribute.

  1. Stick to the current logic which might be non-conformal with the CF conventions in case of "Discrete Sampling Geometries". However, users can manually fix this by setting the coordinates in encoding.

Based on this, I think doing solution one from the previous post on writing a dataset will always be consistent with CF, but assuming that netCDF files XArray reads into datasets will always follow this pattern would be a problem. I suspect there are tests for reading netCDF files with dimension coordinates included in coordinates attributes already, but haven't checked.

  1. Implement a logic to recognize cases where a dataset is a "Discrete Sampling Geometry" and only then list the non-auxiliary coordinates in the variable attribute. This is a bit tricky, and I don't have the time to implement this, I'm afraid.

If you want to try solution three, almost all Discrete Sampling Geometry files must have a global attribute called featureType. Since that attribute is recommended for all Discrete Sampling Geometry files, you could declare that the presence of that attribute defines a Discrete Sampling Geometry file for XArray. However, I don't see any place that says including dimension coordinates in the coordinates attribute is required, even for Discrete Sampling Geometry files, and a few places that explicitly say dimension coordinates can be omitted from the coordinates attribute, even for Discrete Sampling Geometry files.

The references from CF on whether dimension coordinates can be included in the coordinates attribute:

The fifth paragraph of CF section five says:

If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable.

I think this is saying that if you can represent a coordinate using just one dimension, you shouldn't use two (that is, avoid using np.tile(np.arange(10), (3, 1)) as a longitude coordinate). The other interpretation is that dimension coordinates must not be included in the coordinates attribute, which seems unlikely given that three lines later it says:

Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates attribute.

The first paragraph of the section on Discrete sampling geometries:

Every element of every feature must be unambiguously associated with its space and time coordinates and with the feature that contains it. The coordinates attribute must be attached to every data variable to indicate the spatiotemporal coordinate variables that are needed to geo-locate the data.

I think dimension coordinates are explicit enough to count as "unambiguously associated", even without inclusion in the coordinates attribute, since they share a name with one of the dimensions of the Discrete Sampling Geometry data variables. This seems to be made explicit in the fourth paragraph:

Auxiliary coordinate variables containing the nominal and the precise positions should be listed in the relevant coordinates attributes of data variables. In orthogonal representations the nominal positions could be coordinate variables, which do not need to be listed in the coordinates attribute, rather than auxiliary coordinate variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1068869457 https://github.com/pydata/xarray/issues/6310#issuecomment-1068869457 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_tadR tovogt 57705593 2022-03-16T08:39:02Z 2022-03-16T08:40:08Z CONTRIBUTOR

Hey @dcherian,

Thanks for your consideration! I just started to create a new branch with my suggested changes, but then I noticed that setting the coordinates attribute to include non-auxiliary dimensions by default will let a lot of unit tests fail. That made me think about this again: In fact, always setting the coordinates attribute is permissible according to the conventions. However, most users will probably be confused about the clutter and we might run into some discussions about this in the future.

In my original post, the example can easily be fixed to reflect the coordinate attributes in the conventions (https://cfconventions.org/cf-conventions/cf-conventions.html#_single_trajectory) by enforcing the attribute value in the encoding: python import xarray as xr ds = xr.Dataset( {"values": ('time', [0.0, 0.1])}, coords={ 'time': ('time', [0, 1]), 'lat': ('time', [5, 4]), 'lon': ('time', [10, 12]) }) ds['values'].encoding['coordinates'] = "time lon lat" ds.to_netcdf("test.nc") shell $ ncdump -h test.nc netcdf test { dimensions: time = 2 ; variables: double values(time) ; values:_FillValue = NaN ; values:coordinates = "time lon lat" ; int64 time(time) ; int64 lat(time) ; int64 lon(time) ; }

From here, there are several ways to proceed: 1. Stick to the current logic which might be non-conformal with the CF conventions in case of "Discrete Sampling Geometries". However, users can manually fix this by setting the coordinates in encoding. 2. Enforce to always list all non-auxiliary coordinates in the variable attribute, which is conformal with the conventions, but generates a lot of clutter. 3. Implement a logic to recognize cases where a dataset is a "Discrete Sampling Geometry" and only then list the non-auxiliary coordinates in the variable attribute. This is a bit tricky, and I don't have the time to implement this, I'm afraid.

I actually suggest to choose solution (1) and close this issue as "won't fix". What do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1068721026 https://github.com/pydata/xarray/issues/6310#issuecomment-1068721026 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_s2OC dcherian 2448579 2022-03-16T04:31:13Z 2022-03-16T04:31:13Z MEMBER

This seems OK to me :+1: a PR would be welcome.

cc @DWesl

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1054195542 https://github.com/pydata/xarray/issues/6310#issuecomment-1054195542 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84-1b9W tovogt 57705593 2022-02-28T12:13:46Z 2022-02-28T12:33:07Z CONTRIBUTOR

If you are interested in a fix, I would modify the following section: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L726-L744 The variable_coordinates should be initialized for each variable v with the names in v.dims. Currently, the dimension names are explicitly excluded. So it seems to me like this behavior has originally been implemented on purpose because the main use of the coordinates attribute according to the CF convention is to list the auxiliary coordinates. However, the conventions also say: "Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates attribute." (https://cfconventions.org/cf-conventions/cf-conventions.html#coordinate-system)

So, this issue is about including the non-auxiliary coordinates by default because it's permissible in general, and it is even required for so-called "Discrete Sampling Geometries".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.479ms · About: xarray-datasette