home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where issue = 1194993450 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • snowman2 4
  • rabernat 3
  • christophenoel 2
  • rouault 1
  • scottyhq 1
  • wankoelias 1

author_association 3

  • CONTRIBUTOR 4
  • MEMBER 4
  • NONE 4

issue 1

  • Writing GDAL ZARR _CRS attribute not possible · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1236352216 https://github.com/pydata/xarray/issues/6448#issuecomment-1236352216 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85JsTzY rouault 1192433 2022-09-04T14:21:13Z 2022-09-04T14:21:13Z NONE

I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented)

just seeing this conversation. I wasn't aware of GeoZarr when I implemented the _CRS attribute in the GDAL Zarr driver. There is a practical difficulty in reusing the CF reading & writing code from the GDAL netCDF driver as it is tied to that driver, but I'd guess with sufficient effort it could be made agnostic of the carrier. No opposition from me if someone wants to tackle this. That said, the way CRS is encoding in CF conventions is not super elegant, because you need to create a 0-dim variable, define a set of attributes for each projection method, and have an attribute in the georeferenced variable to point to the variable that holds the CRS attribute. A _CRS attribute on the georeferenced variable is much more simple, and by reusing the WKT or PROJJSON encodings, this save the implementor the business of knowing how to define exactly the CRS (obviously when they have a library that does that job for them)

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1099797820 https://github.com/pydata/xarray/issues/6448#issuecomment-1099797820 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BjZU8 rabernat 1197350 2022-04-15T02:38:48Z 2022-04-15T02:38:48Z MEMBER

I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias's original issue reveals is narrower technical issue with Xarray's Zarr writer: Xarray won't let you serialize a dictionary attribute to zarr, even though zarr has no problem with this. That is a problem we can fix pretty easily.

The _validate_attrs helper function was just borrowed from to_netcdf:

https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/backends/api.py#L133-L135

We could refactor this function to be more flexible to account for zarr's broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the to_zarr method. That is the only real decision we need to make here right now.

@wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 2,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098400102 https://github.com/pydata/xarray/issues/6448#issuecomment-1098400102 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BeEFm snowman2 8699967 2022-04-13T19:21:59Z 2022-04-13T19:21:59Z CONTRIBUTOR

GDAL doesn't actually need the CRS stored with the _CRS convention as documented in their ZARR driver specification?

GDAL may need to update the ZARR driver. The NetCDF driver does support CF. It would be good for them to be consistent.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098371121 https://github.com/pydata/xarray/issues/6448#issuecomment-1098371121 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85Bd9Ax wankoelias 15717873 2022-04-13T18:45:23Z 2022-04-13T18:45:23Z NONE

so just to make it clear... GDAL doesn't actually need the CRS stored with the _CRS convention as documented in their ZARR driver specification?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098342334 https://github.com/pydata/xarray/issues/6448#issuecomment-1098342334 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85Bd1-- snowman2 8699967 2022-04-13T18:11:21Z 2022-04-13T18:11:21Z CONTRIBUTOR

Top reasons for using CF over a CRS attribute:

  • Supported by geospatial & netCDF software (GDAL/Esri/Unidata)
  • Attributes are easily lost ref
  • Encourage more usage of standard mechanisms of CRS storage
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098333054 https://github.com/pydata/xarray/issues/6448#issuecomment-1098333054 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85Bdzt- snowman2 8699967 2022-04-13T18:00:12Z 2022-04-13T18:00:12Z CONTRIBUTOR

This document is also a useful reference for storing CRS in xarray: https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098331427 https://github.com/pydata/xarray/issues/6448#issuecomment-1098331427 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BdzUj snowman2 8699967 2022-04-13T17:58:19Z 2022-04-13T17:58:19Z CONTRIBUTOR

Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won't be parsed correctly by GDAL.

GDAL does support the CF conventions for storing the CRS.

https://gdal.org/drivers/raster/netcdf.html#georeference

"The driver first tries to follow the CF-1 Convention from UNIDATA looking for the Metadata named “grid_mapping”."

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1098327424 https://github.com/pydata/xarray/issues/6448#issuecomment-1098327424 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BdyWA scottyhq 3924836 2022-04-13T17:53:21Z 2022-04-13T17:53:21Z MEMBER

One of the main motivations behind the the rioxarray extension is GDAL compatibility. It looks like @snowman2 and @TomAugspurger have discussed saving many geotiffs loaded into xarray as GDAL-compatible Zarr for example https://github.com/corteva/rioxarray/issues/433#issuecomment-967685356.

While it seems that the ultimate solution is agreeing on a format standard, here is another small example using the rioxarray extension where format conversion doesn't currently work as you might expect:

```python

https://github.com/pydata/xarray-data

ds = xr.open_dataset('xarray-data/air_temperature.nc', engine='rasterio')

TooManyDimensions: Only 2D and 3D data arrays supported.

ds.rio.to_raster('test.zarr', driver='ZARR')

Does not error, but output not equivalent to gdal_translate -of ZARR xarray-data/air_temperature.nc gdal_air_temp.zarr

for example, gdalinfo xarray-tutorial-airtemp.zarr gives

Warning 1: Too many samples along the > 2D dimensions of /air.

ds.to_zarr('xarray-tutorial-airtemp.zarr')

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1091708256 https://github.com/pydata/xarray/issues/6448#issuecomment-1091708256 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BEiVg christophenoel 9072111 2022-04-07T13:01:55Z 2022-04-07T13:02:34Z NONE

Some people will prefer Cloud-Optimised GeoTiff, others (geo)Zarr.

By the way, I use rasterio (based on GDAL), xarray, GDAL and all tools without problems with CF conventions.

The impact of GDAL using CRS attribute are only: - when converting a non-netcdf file to Zarr with GDAL, the CRS is encoded in the CRS attribute - when converting a Zarr file to a non-netcdf file which encode CRS in grid-mapping, this is not converted to the CRS attribute.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1091703481 https://github.com/pydata/xarray/issues/6448#issuecomment-1091703481 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BEhK5 rabernat 1197350 2022-04-07T12:57:17Z 2022-04-07T12:57:17Z MEMBER

@christophenoel - I share your perspective. But there is a huge swath of the geospatial world who basically hate NetCDF and avoid it like the plague. These communities prefer to use geotiff and GDAL. We need to reach for interoperability.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1091422044 https://github.com/pydata/xarray/issues/6448#issuecomment-1091422044 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BDcdc christophenoel 9072111 2022-04-07T09:28:27Z 2022-04-07T09:28:27Z NONE

Very interesting topic. I assume that GDAL proposed something as Zarr specification has no provision for spatial reference system encoding.

From my point of view, by making it close to NetCDF (which is so widely used in Earth Observation missions) and providing supporting librarires, xArray made the success of Zarr in the EO world.

Indeed, my dream would be GDAL align to the GeoZarr spec which is mostly aligned with NetCDF/xArray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450
1090742693 https://github.com/pydata/xarray/issues/6448#issuecomment-1090742693 https://api.github.com/repos/pydata/xarray/issues/6448 IC_kwDOAMm_X85BA2ml rabernat 1197350 2022-04-06T20:21:20Z 2022-04-06T20:22:40Z MEMBER

I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr.

However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4?

FWIW, the new GeoZarr Spec by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won't be parsed correctly by GDAL.

I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞 For example, either of these two GDAL PRs: - https://github.com/OSGeo/gdal/pull/3896 - https://github.com/OSGeo/gdal/pull/4521

However, it is not too late! Let's try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr.

My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention.

{
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing GDAL ZARR _CRS attribute not possible 1194993450

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2633.313ms · About: xarray-datasette