home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 497948836

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2844#issuecomment-497948836 https://api.github.com/repos/pydata/xarray/issues/2844 497948836 MDEyOklzc3VlQ29tbWVudDQ5Nzk0ODgzNg== 22566757 2019-06-01T14:20:08Z 2020-07-14T15:55:17Z CONTRIBUTOR

On 5/31/2019 11:50 AM, dcherian wrote:

It isn't just MetPy though. I'm sure there's existing code relying on adding |grid_mapping| and |bounds| to |attrs| in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to |encoding| :/.

At present, the proper, CF-compliant way to do this is to have both |grid_mapping| and |bounds| variables in |data_vars|, and maintain the attributes yourself, including making sure the variables get copied into the result after relevant |ds[var_name]| and |ds.sel(axis=bounds)| operations.

If you decide to move these variables to |coords|, the |bounds| variables will still get dropped on any subsetting operation, including those where the relevant axis was retained, the |grid_mapping| variables will be included in the result of all subsetting operations (including pulling out, for example, a time coordinate), and both will be included in some |coordinates| attribute when written to disk, breaking CF compliance.

This PR only really addresses getting these variables in |coords| initially and keeping them out of the global |coordinates| attribute when writing to disk.

Since I'm doing this primarily to get grid_mapping and bounds
variables out of ds.data_vars.

I'm +1 on this but I wonder whether saving them in |attrs| and using that information when encoding coordinates would be the more pragmatic choice.

You have a point about |grid_mapping|, but applying the MetPy approach of saving the information in another, more directly useful format (|cartopy.Projection| instances) immediately after loading the file would be a way around that.

For |bounds|, I think |pd.PeriodIndex| would be the most natural representation for time, and |pd.IntervalIndex| for most other 1-D cases, but that still leaves |bounds| for two-or-more-dimensional coordinates.

That's a design choice I'll leave to the maintainers.

We could define |encoding| as containing a specified set of CF attributes that control on-disk representation such as |units|, |scale_factor|, |contiguous| etc. and leaving everything else in |attrs|. A full list of attributes that belong in |encoding| could be in the docs so that downstream packages can fully depend on this behaviour.

Currently I see |coordinates| is interpreted and moved to |encoding|. In the above proposal, this would be left in |attrs| but its value would still be interpreted if |decode_coords=True|.

What do you think?

At present, |set(ds[var_name].attrs["coordinates"].split())| and |set(ds[var_name].coords) - set(ds[var_name].indexes[dim_name])| would be identical, since the |coordinates| attribute is essentially computed from the second expression on write.

Do you have a use case in mind where you need specifically the list of CF auxiliary coordinates, or is that just an example of something that would change under the new proposal? I assume |units| would be moved to |encoding| only for |datetime64[ns]| and |timedelta64[ns]| variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  424265093
Powered by Datasette · Queries took 0.636ms · About: xarray-datasette