home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

38 rows where issue = 424265093 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • DWesl 19
  • dcherian 9
  • dopplershift 2
  • DocOtak 2
  • shoyer 2
  • jthielen 2
  • andreas-h 1
  • pep8speaks 1

author_association 3

  • CONTRIBUTOR 26
  • MEMBER 11
  • NONE 1

issue 1

  • Read grid mapping and bounds as coords · 38 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
778717611 https://github.com/pydata/xarray/pull/2844#issuecomment-778717611 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3ODcxNzYxMQ== DWesl 22566757 2021-02-14T03:35:55Z 2021-02-14T03:35:55Z CONTRIBUTOR

~Does anyone know why the xr.open_dataset(....) call is echoed in the warning message. Is this intentional? Cc @dcherian @DWesl~

It seems you've already figured this out, but for anyone else with this question, the repeat of the call on that file is part of the warning that the file does not have all the variables the attributes refer to. You can fix this by recreating the file with the listed variables added (areacella, or by deleting the attribute from the variables (cell_measures). You can also ignore the warning using the machinery in the warnings module.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
778632780 https://github.com/pydata/xarray/pull/2844#issuecomment-778632780 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3ODYzMjc4MA== dcherian 2448579 2021-02-13T15:17:21Z 2021-02-13T15:17:21Z MEMBER

Great I'll merge before the next release if no one else has comments. Thanks @DWesl

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
778629061 https://github.com/pydata/xarray/pull/2844#issuecomment-778629061 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3ODYyOTA2MQ== DWesl 22566757 2021-02-13T14:46:25Z 2021-02-13T14:46:25Z CONTRIBUTOR

I think this looks good.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
586277583 https://github.com/pydata/xarray/pull/2844#issuecomment-586277583 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU4NjI3NzU4Mw== pep8speaks 24736507 2020-02-14T12:59:36Z 2021-02-11T17:47:29Z NONE

Hello @DWesl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-02-11 17:47:29 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
777487024 https://github.com/pydata/xarray/pull/2844#issuecomment-777487024 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3NzQ4NzAyNA== dcherian 2448579 2021-02-11T14:12:04Z 2021-02-11T14:12:04Z MEMBER

One option is to change decode_coords from bool to Union[bool, str] and allow decode_coords to be either "all" (this PR) or "coordinates", or True ( == "coordinates", backwards compatible). I can bring this up at the next dev meeting.

Updated to implement this proposal after getting OK at the previous meeting.

@DWesl can you take a look at the most recent changes please?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
762899193 https://github.com/pydata/xarray/pull/2844#issuecomment-762899193 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc2Mjg5OTE5Mw== dcherian 2448579 2021-01-19T15:03:02Z 2021-01-19T15:03:02Z MEMBER

there were some substantial concerns about backwards compatibility

:( yes this is currently backward incompatible (so the next release will bump major version). There's reluctance to add yet another decode_* kwarg to open_dataset, and also reluctance to issue a warning saying that the behaviour of decode_coords will change in a future version (since this would be very common).

One option is to change decode_coords from bool to Union[bool, str] and allow decode_coords to be either "all" (this PR) or "coordinates", or True ( == "coordinates", backwards compatible). I can bring this up at the next dev meeting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
761843786 https://github.com/pydata/xarray/pull/2844#issuecomment-761843786 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc2MTg0Mzc4Ng== jthielen 3460034 2021-01-17T16:58:34Z 2021-01-17T16:58:34Z CONTRIBUTOR

I unfortunately have not been following along with the recent developments in this PR, so these may have already been previously covered. Sorry if that is the case!

However, earlier on in the development of this PR, there were some substantial concerns about backwards compatibility (e.g., for libraries like MetPy that currently rely on grid_mapping and the like being in attrs) and the improved propagation of encoding through operations (so that moving these to encoding doesn't mean they are unnecessarily lost). What is the current status with regards to these?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
761842344 https://github.com/pydata/xarray/pull/2844#issuecomment-761842344 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc2MTg0MjM0NA== DWesl 22566757 2021-01-17T16:48:39Z 2021-01-17T16:48:39Z CONTRIBUTOR

Looks good to me. I was wondering where those docstrings were.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
674158261 https://github.com/pydata/xarray/pull/2844#issuecomment-674158261 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3NDE1ODI2MQ== dcherian 2448579 2020-08-14T16:33:31Z 2020-08-14T16:33:31Z MEMBER

I think it's OK if we skip ancillary_variables, the CF data model does not map perfectly to xarray's data model so this kind of annoyance will always be present. Also thanks @DocOtak for that pathological example :D (now I know what those mathematicians talk about)

A method or addition to getitem on the accessor returning a Dataset with the linked variables include

This has already been implemented though it could use a lot more testing: https://cf-xarray.readthedocs.io/en/latest/examples/introduction.html#Feature:-Accessing-variables-by-standard-names

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
671094846 https://github.com/pydata/xarray/pull/2844#issuecomment-671094846 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MTA5NDg0Ng== jthielen 3460034 2020-08-09T20:03:09Z 2020-08-09T20:03:09Z CONTRIBUTOR

For what it's worth, here's a bit of an elaboration on my view from https://github.com/pydata/xarray/issues/4215#issuecomment-656789017 and https://github.com/pydata/xarray/issues/4121#issuecomment-639758583:

ancillary_variables really fit a role of "non-coordinate linked variables," which doesn't have a good place in xarray's current data model. Both the current behavior (any links are completely lost if not manually handled) and linking as coordinates (conceptually confusing and causing spurious proliferation) are problematic. Short of a "linked" or "subsidiary" variable concept being added to xarray's data model (could be worth discussing on a new issue?), cf-xarray seems like the best place for managing ancillary_variables, either through - A method that extracts ancilliary_variables from a Dataset given a variable name (https://github.com/xarray-contrib/cf-xarray/issues/5#issue-628569748) - Parsing the ancilliary_variables attribute and replacing the string with a Dataset or list of DataArrays or Variables containing the actual ancillary variables (https://github.com/pydata/xarray/issues/4121#issue-630573329 and https://github.com/pydata/xarray/pull/2844#issuecomment-670991386) - A method or addition to __getitem__ on the accessor returning a Dataset with the linked variables included (https://github.com/pydata/xarray/pull/2844#issuecomment-670996109)

For those interested, the issue for ancillary_variables in cf-xarray is https://github.com/xarray-contrib/cf-xarray/issues/5.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670996109 https://github.com/pydata/xarray/pull/2844#issuecomment-670996109 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDk5NjEwOQ== DWesl 22566757 2020-08-09T02:17:07Z 2020-08-09T16:36:12Z CONTRIBUTOR

That's two people with that view so I made the change.

Again, I feel that the quality flags are essentially meaningless on their own, useful primarily in the context of their associated variables, like the items currently put in the XArray coords attribute, which, admittedly, is only those variables identified by CF as dimension or auxiliary coordinates at the moment, and should remain associated with the relevant variable even if it is extracted into a DataArray. Since all of the other people who have opinions on the matter seem to disagree with me, I changed the code to preserve the present behavior with regards to ancillary_variables. I can always monkey-patch it back in if it really bothers me, or add a Dataset.__getitem__ wrapper to xarray-contrib/cf-xarray so that the ancillary_variables stay associated when I pull variables out, or move back to SciTools/iris.

On a related note, I should probably check whether this breaks conversion to an iris.Cube.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670991386 https://github.com/pydata/xarray/pull/2844#issuecomment-670991386 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDk5MTM4Ng== DocOtak 868027 2020-08-09T01:11:09Z 2020-08-09T01:11:09Z CONTRIBUTOR

Yes, my view is that things in ancillary_variables should stay in the attrs of their variable (DataArray) and not moved to coords. Currently this PR will remove the ancillary_variables from the attrs of the variables in a file which have it. This appears to break CF defined connection between associated variables (like uncertainty and QC). While the information isn't lost, I would need to look in encoding to get it. It looks like the first reply in #4215 also didn't like putting ancillary_variables in the coords.

What would be really awesome is some sort of variable proxy I could replace the string names with actual references/pointers to the correct DataArray in the Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670816691 https://github.com/pydata/xarray/pull/2844#issuecomment-670816691 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDgxNjY5MQ== DWesl 22566757 2020-08-08T03:25:17Z 2020-08-08T03:53:39Z CONTRIBUTOR

You are correct; ancillary_variables is neither grid_mapping or bounds.

My personal view is that the quality information should stay with the variable it describes unless explicitly dropped; I think your view is that quality information can always be extracted from the original dataset, and that no variable should carry quality information for a different variable. At this point it would be simple to remove ancillary_variables from the attributes processed by this PR. There was a suggestion earlier of adding a decode_aux_vars argument to control the new behavior as a means of avoiding back-compatibility breaks like this one. I will leave that as a question for the maintainers; there is also some related discussion at #4215.

I should point out that a similar situation arises for grid_mapping; ds.coords["time"] will include the grid_mapping variable in its coordinates. In contrast, ds.coords["x"] will not include the bounds for the x variable, since it has more dimensions than ds.coords["x"]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670808763 https://github.com/pydata/xarray/pull/2844#issuecomment-670808763 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDgwODc2Mw== DocOtak 868027 2020-08-08T02:12:08Z 2020-08-08T02:12:08Z CONTRIBUTOR

I decided to try out this PR on some of the data files we are working with at my data office. In our datasets we have per variable quality flag information per variable uncertainty information. The CF way of tying all these together is via the ancillary_variables attribute. This PR pulls all these out into the Dataset coordinates. Since in the xarray data model (right now) the coordinates apply to an entire dataset, this feels inappropriate and maybe even breaking. The ancillary_variables attribute is not used in CF grid mapping or bounds as far as I can tell.

Here is an example using this PR (note that all the varN type names will be replaced with better variable names before we publish these):

```python In [1]: import xarray as xr

In [2]: ds = xr.open_dataset("examples/converted/06AQ19840719.nc")

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (N_LEVELS: 24, N_PROF: 38) Coordinates: var1_qc (N_PROF, N_LEVELS) float32 ... var4_qc (N_PROF, N_LEVELS) float32 ... var5_qc (N_PROF, N_LEVELS) float32 ... var6_qc (N_PROF, N_LEVELS) float32 ... var7_qc (N_PROF, N_LEVELS) float32 ... var8_qc (N_PROF, N_LEVELS) float32 ... var9_qc (N_PROF, N_LEVELS) float32 ... var10_qc (N_PROF, N_LEVELS) float32 ... var11_qc (N_PROF, N_LEVELS) float32 ... var12_qc (N_PROF, N_LEVELS) float32 ... var13_qc (N_PROF, N_LEVELS) float32 ... var14_qc (N_PROF, N_LEVELS) float32 ... var15_qc (N_PROF, N_LEVELS) float32 ... pressure (N_PROF, N_LEVELS) float64 ... latitude (N_PROF) float64 ... longitude (N_PROF) float64 ... time (N_PROF) datetime64[ns] ... expocode (N_PROF) object ... station (N_PROF) object ... cast (N_PROF) int8 ... sample (N_PROF, N_LEVELS) object ... Dimensions without coordinates: N_LEVELS, N_PROF Data variables: var0 (N_PROF) object ... var1 (N_PROF, N_LEVELS) object ... var2 (N_PROF) float32 ... var3 (N_PROF, N_LEVELS) float32 ... var4 (N_PROF, N_LEVELS) float32 ... var5 (N_PROF, N_LEVELS) float32 ... var6 (N_PROF, N_LEVELS) float32 ... var7 (N_PROF, N_LEVELS) float32 ... var8 (N_PROF, N_LEVELS) float32 ... var9 (N_PROF, N_LEVELS) float32 ... var10 (N_PROF, N_LEVELS) float32 ... var11 (N_PROF, N_LEVELS) float32 ... var12 (N_PROF, N_LEVELS) float32 ... var13 (N_PROF, N_LEVELS) float32 ... var14 (N_PROF, N_LEVELS) float32 ... var15 (N_PROF, N_LEVELS) float32 ... Attributes: Conventions: CF-1.8 CCHDO-0.1.dev157+g52933e0.d20200707 ```

This looks especially confusing when you ask for one specific variable:

python In [15]: ds.var6 Out[15]: <xarray.DataArray 'var6' (N_PROF: 38, N_LEVELS: 24)> array([[33.3965, 33.5742, 34.8769, ..., 34.9858, 34.9852, nan], [33.1129, 34.0742, 34.6595, ..., nan, nan, nan], [32.5328, 33.2687, 34.2262, ..., nan, nan, nan], ..., [35.0686, 35.09 , 35.1415, ..., nan, nan, nan], [35.0303, 35.0295, 35.0715, ..., nan, nan, nan], [35.0682, 35.0756, 35.0622, ..., nan, nan, nan]], dtype=float32) Coordinates: var1_qc (N_PROF, N_LEVELS) float32 ... var4_qc (N_PROF, N_LEVELS) float32 ... var5_qc (N_PROF, N_LEVELS) float32 ... var6_qc (N_PROF, N_LEVELS) float32 ... var7_qc (N_PROF, N_LEVELS) float32 ... var8_qc (N_PROF, N_LEVELS) float32 ... var9_qc (N_PROF, N_LEVELS) float32 ... var10_qc (N_PROF, N_LEVELS) float32 ... var11_qc (N_PROF, N_LEVELS) float32 ... var12_qc (N_PROF, N_LEVELS) float32 ... var13_qc (N_PROF, N_LEVELS) float32 ... var14_qc (N_PROF, N_LEVELS) float32 ... var15_qc (N_PROF, N_LEVELS) float32 ... pressure (N_PROF, N_LEVELS) float64 ... latitude (N_PROF) float64 ... longitude (N_PROF) float64 ... time (N_PROF) datetime64[ns] ... expocode (N_PROF) object ... station (N_PROF) object ... cast (N_PROF) int8 ... sample (N_PROF, N_LEVELS) object ... Dimensions without coordinates: N_PROF, N_LEVELS Attributes: whp_name: CTDSAL whp_unit: PSS-78 standard_name: sea_water_practical_salinity units: 1 reference_scale: PSS-78

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670765806 https://github.com/pydata/xarray/pull/2844#issuecomment-670765806 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDc2NTgwNg== DWesl 22566757 2020-08-07T22:29:20Z 2020-08-07T22:29:20Z CONTRIBUTOR

The MinimumVersionsPolicy error appears to be a series of internal conda errors, and is probably unrelated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670730008 https://github.com/pydata/xarray/pull/2844#issuecomment-670730008 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDczMDAwOA== DWesl 22566757 2020-08-07T22:02:47Z 2020-08-07T22:02:47Z CONTRIBUTOR

pydata/xarray-data#19

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670728389 https://github.com/pydata/xarray/pull/2844#issuecomment-670728389 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDcyODM4OQ== dcherian 2448579 2020-08-07T21:56:29Z 2020-08-07T21:56:29Z MEMBER

@DWesl can you put delete the "bounds" attribute and send in a PR to xarray-data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
667744209 https://github.com/pydata/xarray/pull/2844#issuecomment-667744209 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY2Nzc0NDIwOQ== DWesl 22566757 2020-08-03T00:14:24Z 2020-08-03T19:42:02Z CONTRIBUTOR

The rasm dataset has coordinates xc and yc, which reference bounds xv and yv respectively, which I do not see in the variable list with decode_coords=False. It would appear that pydata/xarray-data#4 did not include the bounds in the updated dataset when adding coordinates to rasm.nc, so this warning is correct. I do not know that file, so I'm probably not the best person to add bounds. Should I wait for an update to pydata/xarray-data, or should I ask sphinx to ignore the warning?

Another option is to just delete the bounds attributes of xc and yc in rasm.nc

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
667737160 https://github.com/pydata/xarray/pull/2844#issuecomment-667737160 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY2NzczNzE2MA== DWesl 22566757 2020-08-02T23:13:26Z 2020-08-02T23:13:26Z CONTRIBUTOR

The example the doc build doesn't like: ```python ds = xr.tutorial.load_dataset("rasm") ds.to_zarr("rasm.zarr", mode="w") import zarr

zgroup = zarr.open("rasm.zarr") print(zgroup.tree()) dict(zgroup["Tair"].attrs) `` I'll need to look into therasm` dataset to figure out why there is a warning now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497948836 https://github.com/pydata/xarray/pull/2844#issuecomment-497948836 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5Nzk0ODgzNg== DWesl 22566757 2019-06-01T14:20:08Z 2020-07-14T15:55:17Z CONTRIBUTOR

On 5/31/2019 11:50 AM, dcherian wrote:

It isn't just MetPy though. I'm sure there's existing code relying on adding |grid_mapping| and |bounds| to |attrs| in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to |encoding| :/.

At present, the proper, CF-compliant way to do this is to have both |grid_mapping| and |bounds| variables in |data_vars|, and maintain the attributes yourself, including making sure the variables get copied into the result after relevant |ds[var_name]| and |ds.sel(axis=bounds)| operations.

If you decide to move these variables to |coords|, the |bounds| variables will still get dropped on any subsetting operation, including those where the relevant axis was retained, the |grid_mapping| variables will be included in the result of all subsetting operations (including pulling out, for example, a time coordinate), and both will be included in some |coordinates| attribute when written to disk, breaking CF compliance.

This PR only really addresses getting these variables in |coords| initially and keeping them out of the global |coordinates| attribute when writing to disk.

Since I'm doing this primarily to get grid_mapping and bounds
variables out of ds.data_vars.

I'm +1 on this but I wonder whether saving them in |attrs| and using that information when encoding coordinates would be the more pragmatic choice.

You have a point about |grid_mapping|, but applying the MetPy approach of saving the information in another, more directly useful format (|cartopy.Projection| instances) immediately after loading the file would be a way around that.

For |bounds|, I think |pd.PeriodIndex| would be the most natural representation for time, and |pd.IntervalIndex| for most other 1-D cases, but that still leaves |bounds| for two-or-more-dimensional coordinates.

That's a design choice I'll leave to the maintainers.

We could define |encoding| as containing a specified set of CF attributes that control on-disk representation such as |units|, |scale_factor|, |contiguous| etc. and leaving everything else in |attrs|. A full list of attributes that belong in |encoding| could be in the docs so that downstream packages can fully depend on this behaviour.

Currently I see |coordinates| is interpreted and moved to |encoding|. In the above proposal, this would be left in |attrs| but its value would still be interpreted if |decode_coords=True|.

What do you think?

At present, |set(ds[var_name].attrs["coordinates"].split())| and |set(ds[var_name].coords) - set(ds[var_name].indexes[dim_name])| would be identical, since the |coordinates| attribute is essentially computed from the second expression on write.

Do you have a use case in mind where you need specifically the list of CF auxiliary coordinates, or is that just an example of something that would change under the new proposal? I assume |units| would be moved to |encoding| only for |datetime64[ns]| and |timedelta64[ns]| variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
644407583 https://github.com/pydata/xarray/pull/2844#issuecomment-644407583 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY0NDQwNzU4Mw== dcherian 2448579 2020-06-15T21:47:08Z 2020-06-15T21:47:08Z MEMBER

Hi @DWesl . I sincerely apologize for not responding earlier.

Reading over this thread again, I think I agree with @shoyer these should live in encoding, because we've technically "decoded" the attribute and converted these variables to coords.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
644405067 https://github.com/pydata/xarray/pull/2844#issuecomment-644405067 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY0NDQwNTA2Nw== DWesl 22566757 2020-06-15T21:40:49Z 2020-06-15T21:40:49Z CONTRIBUTOR

This PR currently puts grid_mapping and bounds in encoding once it is done with them. Is that where XArray wants to put them, or should they be somewhere else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
633253434 https://github.com/pydata/xarray/pull/2844#issuecomment-633253434 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDYzMzI1MzQzNA== DWesl 22566757 2020-05-24T16:09:04Z 2020-05-24T16:09:04Z CONTRIBUTOR

Should I change this to put grid_mapping and bounds back in attrs, or should I leave them in encoding?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
598016992 https://github.com/pydata/xarray/pull/2844#issuecomment-598016992 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5ODAxNjk5Mg== shoyer 1217238 2020-03-12T05:43:30Z 2020-03-12T05:43:30Z MEMBER

We could probably make it a rule that encoding gets preserved in exactly the same way as attrs. I think there are some old issues about formalizing metadata conventions...

On Wed, Mar 11, 2020 at 2:19 PM Ryan May notifications@github.com wrote:

Thanks for the info. Based on that, I lean towards attrs.

I think a better rationale, though, would be to formalize the role of encoding in xarray.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2844#issuecomment-597883721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVVUMBSZNECAM74HF4TRG757BANCNFSM4HAPJB7A .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
597883721 https://github.com/pydata/xarray/pull/2844#issuecomment-597883721 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5Nzg4MzcyMQ== dopplershift 221526 2020-03-11T21:19:31Z 2020-03-11T21:19:31Z CONTRIBUTOR

Thanks for the info. Based on that, I lean towards attrs.

I think a better rationale, though, would be to formalize the role of encoding in xarray.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
597375929 https://github.com/pydata/xarray/pull/2844#issuecomment-597375929 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5NzM3NTkyOQ== DWesl 22566757 2020-03-10T23:54:41Z 2020-03-10T23:54:41Z CONTRIBUTOR

I think the choice is between attrs and encoding, not both.

If it helps lean your decision one way or the other, attrs tends to stay associated with Datasets through more operations than encoding, so parse_cf() would have to be called fairly soon after opening if the information ends up in encoding, while putting it in attrs gives users a bit more time for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
597374274 https://github.com/pydata/xarray/pull/2844#issuecomment-597374274 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5NzM3NDI3NA== dopplershift 221526 2020-03-10T23:47:43Z 2020-03-10T23:47:43Z CONTRIBUTOR

As a downstream user, I just want to be told what to do (assuming encoding is part of the public API for xarray). I'd love not to have to modify our code, but that's not essential necessarily.

So to clarify: is this about whether they should be in one spot or the other? Or is it about having grid_mapping and bounds in both?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
596954223 https://github.com/pydata/xarray/pull/2844#issuecomment-596954223 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5Njk1NDIyMw== dcherian 2448579 2020-03-10T08:03:37Z 2020-03-10T08:03:37Z MEMBER

My preference is for attrs but someone else should weigh in. cc @pydata/xarray

Maybe @dopplershift , as MetPy maintainer has thoughts on this matter?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
587466093 https://github.com/pydata/xarray/pull/2844#issuecomment-587466093 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU4NzQ2NjA5Mw== DWesl 22566757 2020-02-18T13:43:12Z 2020-02-18T13:43:12Z CONTRIBUTOR

The test failures seem to all be due to recent changes in cftime/CFTimeIndex, which I haven't touched.

Is sticking the grid_mapping and bounds attributes in encoding good, or should I put them back in attrs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
586273656 https://github.com/pydata/xarray/pull/2844#issuecomment-586273656 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU4NjI3MzY1Ng== DWesl 22566757 2020-02-14T12:47:06Z 2020-02-14T12:47:06Z CONTRIBUTOR

I just noticed pandas.PeriodIndex would be an alternative to pandas.IntervalIndex for time data if which side the interval is closed on is largely irrelevant for such data.

Is there an interest in using these for 1D coordinates with bounds? I think ds.groupby_bins() already returns an IntervalIndex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497760666 https://github.com/pydata/xarray/pull/2844#issuecomment-497760666 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5Nzc2MDY2Ng== dcherian 2448579 2019-05-31T15:50:28Z 2019-05-31T15:50:28Z MEMBER

It isn't just MetPy though. I'm sure there's existing code relying on adding grid_mapping and bounds to attrs in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to encoding :/.

Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars.

I'm +1 on this but I wonder whether saving them in attrs and using that information when encoding coordinates would be the more pragmatic choice.

We could define encoding as containing a specified set of CF attributes that control on-disk representation such as units, scale_factor, contiguous etc. and leaving everything else in attrs. A full list of attributes that belong in encoding could be in the docs so that downstream packages can fully depend on this behaviour.

Currently I see coordinates is interpreted and moved to encoding. In the above proposal, this would be left in attrs but its value would still be interpreted if decode_coords=True.

What do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497566742 https://github.com/pydata/xarray/pull/2844#issuecomment-497566742 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzU2Njc0Mg== DWesl 22566757 2019-05-31T04:00:17Z 2019-05-31T04:00:17Z CONTRIBUTOR

Switched to use in rather than is not None.

Re: grid_mapping in .encoding not .attrs MetPy assumes grid_mapping will be in .attrs. Since the xarray documentation mentions this capability, should I be making concurrent changes to MetPy to allow this to continue?

If so, would it be sufficient to change their .attrs references to .encoding and mentioning in both sets of documentation that the user should call ds.metpy.parse_cf() immediately after loading to ensure the information is available for MetPy to use? I don't entirely understand the accessor API.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497558317 https://github.com/pydata/xarray/pull/2844#issuecomment-497558317 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzU1ODMxNw== DWesl 22566757 2019-05-31T03:04:06Z 2019-05-31T03:13:53Z CONTRIBUTOR

This is briefly mentioned above, in https://github.com/pydata/xarray/pull/2844#discussion_r270595609 The rationale was that everywhere else xarray uses CF attributes for something, the original values of those attributes are recorded in var.encoding, not var.attrs, and consistency across a code base is a good thing. Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars, I don't have strong opinions on the subject.

If you feel strongly to the contrary, there's an idea at the top of this thread for getting bounds information encoded in terms xarray already uses in some cases (Dataset.groupby_bins()), and the diffs for this PR should help you figure out what needs changing to support this.

For grid_mapping there's http://xarray.pydata.org/en/latest/weather-climate.html#cf-compliant-coordinate-variables which is enough for my uses.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497553898 https://github.com/pydata/xarray/pull/2844#issuecomment-497553898 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzU1Mzg5OA== dcherian 2448579 2019-05-31T02:37:31Z 2019-05-31T02:37:31Z MEMBER

I'm a little confused on why these are in encoding and not in attrs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497308484 https://github.com/pydata/xarray/pull/2844#issuecomment-497308484 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzMwODQ4NA== andreas-h 358378 2019-05-30T12:11:02Z 2019-05-30T12:11:02Z CONTRIBUTOR

I'd like this one to be merged very much. Is there anything holding this back?

Also, it might be nice to update the documentation with info on this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
478298195 https://github.com/pydata/xarray/pull/2844#issuecomment-478298195 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ3ODI5ODE5NQ== shoyer 1217238 2019-03-30T23:27:19Z 2019-03-30T23:27:19Z MEMBER

The current VariableCoder interface only works for coders that work at the level of individual variables. But coordinates only make sense on a dataset, so we'll need a different abstraction for that, e.g., a DatasetCoder?

For now, it's probably fine to keep this logic where you have it currently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
478290053 https://github.com/pydata/xarray/pull/2844#issuecomment-478290053 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ3ODI5MDA1Mw== DWesl 22566757 2019-03-30T21:17:17Z 2019-03-30T21:17:17Z CONTRIBUTOR

I can shift this to use encoding only, but I'm having trouble figuring out where that code would go.

Would the preferred path be to create VariableCoder classes for each and add them to encode_cf_variable, then add tests to xarray.tests.test_coding?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
476586154 https://github.com/pydata/xarray/pull/2844#issuecomment-476586154 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ3NjU4NjE1NA== DWesl 22566757 2019-03-26T11:31:05Z 2019-03-26T11:31:05Z CONTRIBUTOR

Related to #1475 and #2288 , but this is just keeping the metadata consistent where already present, not extending the data model to include bounds, cells, or projections. I should add a test to ensure saving still works if the bounds are lost when pulling out variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.916ms · About: xarray-datasette