issue_comments: 497948836

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/2844#issuecomment-497948836	https://api.github.com/repos/pydata/xarray/issues/2844	497948836	MDEyOklzc3VlQ29tbWVudDQ5Nzk0ODgzNg==	22566757	2019-06-01T14:20:08Z	2020-07-14T15:55:17Z	CONTRIBUTOR	On 5/31/2019 11:50 AM, dcherian wrote: It isn't just MetPy though. I'm sure there's existing code relying on adding \|grid_mapping\| and \|bounds\| to \|attrs\| in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to \|encoding\| :/. At present, the proper, CF-compliant way to do this is to have both \|grid_mapping\| and \|bounds\| variables in \|data_vars\|, and maintain the attributes yourself, including making sure the variables get copied into the result after relevant \|ds[var_name]\| and \|ds.sel(axis=bounds)\| operations. If you decide to move these variables to \|coords\|, the \|bounds\| variables will still get dropped on any subsetting operation, including those where the relevant axis was retained, the \|grid_mapping\| variables will be included in the result of all subsetting operations (including pulling out, for example, a time coordinate), and both will be included in some \|coordinates\| attribute when written to disk, breaking CF compliance. This PR only really addresses getting these variables in \|coords\| initially and keeping them out of the global \|coordinates\| attribute when writing to disk. `Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars.` I'm +1 on this but I wonder whether saving them in \|attrs\| and using that information when encoding coordinates would be the more pragmatic choice. You have a point about \|grid_mapping\|, but applying the MetPy approach of saving the information in another, more directly useful format (\|cartopy.Projection\| instances) immediately after loading the file would be a way around that. For \|bounds\|, I think \|pd.PeriodIndex\| would be the most natural representation for time, and \|pd.IntervalIndex\| for most other 1-D cases, but that still leaves \|bounds\| for two-or-more-dimensional coordinates. That's a design choice I'll leave to the maintainers. We could define \|encoding\| as containing a specified set of CF attributes that control on-disk representation such as \|units\|, \|scale_factor\|, \|contiguous\| etc. and leaving everything else in \|attrs\|. A full list of attributes that belong in \|encoding\| could be in the docs so that downstream packages can fully depend on this behaviour. Currently I see \|coordinates\| is interpreted and moved to \|encoding\|. In the above proposal, this would be left in \|attrs\| but its value would still be interpreted if \|decode_coords=True\|. What do you think? At present, \|set(ds[var_name].attrs["coordinates"].split())\| and \|set(ds[var_name].coords) - set(ds[var_name].indexes[dim_name])\| would be identical, since the \|coordinates\| attribute is essentially computed from the second expression on write. Do you have a use case in mind where you need specifically the list of CF auxiliary coordinates, or is that just an example of something that would change under the new proposal? I assume \|units\| would be moved to \|encoding\| only for \|datetime64[ns]\| and \|timedelta64[ns]\| variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		424265093