home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where issue = 1172229856 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • DennisHeimbigner 5
  • joshmoore 3
  • shoyer 3
  • rabernat 2
  • malmans2 2
  • dopplershift 1
  • max-sixty 1
  • wankoelias 1

author_association 3

  • NONE 9
  • MEMBER 6
  • CONTRIBUTOR 3

issue 1

  • Should the zarr backend support NCZarr conventions? · 18 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1090499559 https://github.com/pydata/xarray/issues/6374#issuecomment-1090499559 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85A_7Pn shoyer 1217238 2022-04-06T17:04:26Z 2022-04-06T17:04:26Z MEMBER

As it is currently it is also not possible to write a zarr which follows the GDAL ZARR driver conventions. Writing the _CRS attribute also results in a TypeError:

Can you elaborate? What API are you using to do the write: python, netcdf-c, or what?

This error message comes from Xarray and can be triggered by calling to_zarr(): https://github.com/pydata/xarray/blob/facafac359c39c3e940391a3829869b4a3df5d70/xarray/backends/api.py#L162

I don't think netCDF-C needs to be involved at all, which is why I suggested opening a separate issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1090483461 https://github.com/pydata/xarray/issues/6374#issuecomment-1090483461 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85A_3UF DennisHeimbigner 905179 2022-04-06T16:46:32Z 2022-04-06T16:46:32Z NONE

As it is currently it is also not possible to write a zarr which follows the GDAL ZARR driver conventions. Writing the _CRS attribute also results in a TypeError:

Can you elaborate? What API are you using to do the write: python, netcdf-c, or what?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1090464275 https://github.com/pydata/xarray/issues/6374#issuecomment-1090464275 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85A_yoT shoyer 1217238 2022-04-06T16:25:40Z 2022-04-06T16:25:40Z MEMBER

@wankoelias could you kindly open a new issue for writing GDAL ZARR?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1090132193 https://github.com/pydata/xarray/issues/6374#issuecomment-1090132193 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85A-hjh wankoelias 15717873 2022-04-06T10:54:19Z 2022-04-06T10:54:51Z NONE

As it is currently it is also not possible to write a zarr which follows the GDAL ZARR driver conventions. Writing the _CRS attribute also results in a TypeError:

For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1081553127 https://github.com/pydata/xarray/issues/6374#issuecomment-1081553127 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AdzDn malmans2 22245117 2022-03-29T08:01:36Z 2022-03-29T08:01:36Z CONTRIBUTOR

Thanks! #6420 looks at .zarray["_NCZARR_ARRAY"]["dimrefs"] only if .zattrs["_ARRAY_ATTRIBUTE"] is missing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1081236360 https://github.com/pydata/xarray/issues/6374#issuecomment-1081236360 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AcluI DennisHeimbigner 905179 2022-03-28T23:05:51Z 2022-03-28T23:05:51Z NONE

dimension names stored by xarray in .zattrs["_ARRAY_DIMENSIONS"] are stored by NCZarr in .zarray["_NCZARR_ARRAY"]["dimrefs"]

I made a recent change to this so that where possible, all NCZarr files contain the xarray _ARRAY_ATTRIBUTE. By "where possible" I mean that the array is in the root group and the dimensions it references are "defined" in the root group (i.e. they have the simple FQN "/XXX" where XXX is the dim name. This means that there is sometimes a duplication of information between _ARRAY_ATTRIBUTE and ".zarray["_NCZARR_ARRAY"]["dimrefs"].

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1081139207 https://github.com/pydata/xarray/issues/6374#issuecomment-1081139207 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AcOAH malmans2 22245117 2022-03-28T21:01:19Z 2022-03-28T21:01:19Z CONTRIBUTOR

Adding support for reading NCZarr in the "zarr" backend should be quite easy if xarray doesn't need to integrate the additional features in NCZarr (e.g., groups, fully qualified names, dtypes for attributes). It looks like the main difference is that the dimension names stored by xarray in .zattrs["_ARRAY_DIMENSIONS"] are stored by NCZarr in .zarray["_NCZARR_ARRAY"]["dimrefs"]. I drafted PR #6420 to explore what it would take to support reading NCZarr in xarray's "zarr" backend, and I don't think there are major changes/additions needed. (I'm experiencing issues with Windows in PR #6420. I think they need to be explored in netcdf4-python or netcdf-c though - I've added a comment in the PR)

I'm not sure whether it is better to (i) add direct support for NCZarr in xarray or (ii) just rely on the netcdf4 backend. After playing a bit with both backends, I have a few comments if option (ii) is chosen: * I would change the error raised when "_ARRAY_DIMENSIONS" is not present, suggesting to try the netcdf4 backend as well. Also, I think it's worth pointing out in the documentation or in the error message where to find information on how to open/write zarr data with the netcdf4 backend. I suspect right now it's not easy to find that information for python/xarray users. * I would consider starting a deprecation cycle for open_zarr, so it will be more clear that zarr data can be opened using various backends. * If "_ARRAY_DIMENSIONS" and "_NC*" attributes will coexist in the next version of NCZarr, the zarr backend will be able to open NCZarr but will treat "_NC*" attributes as regular attributes. I think the "zarr" backend would have to handle "_NC*" attributes (e.g., drop or hide), otherwise there can be issues when writing: TypeError: Invalid value for attr '_NCZARR_ATTR': {'types': {'Conventions': '<U1', 'title': '<U1', 'description': '<U1', 'platform': '<U1', 'references': '<U1', '_NCProperties': '<U1'}}. For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076942019 https://github.com/pydata/xarray/issues/6374#issuecomment-1076942019 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AMNTD joshmoore 88113 2022-03-24T00:18:14Z 2022-03-24T00:18:14Z NONE

rabernat commented 7 hours ago My opinion is that we should not try to support the nczarr conventions directly. Xarray already supports nczarr via netCDF4. If netCDF4 can open the Zarr store, then Xarray can read it. ... I would turn this question around and ask: if netCDF4 supports access to these datasets directly, what's the advantage of xarray bypassing netCDF4 and opening them directly?

@malmans2 can chime in with his experience, but it seems that from the user point-of-view, not needing to know if something is an xarray-zarr or a nczarr would be kinder of us. Plus as said below, I do think it puts us on the path to defining a common spec.

Supporting nczarr directly would require lots of custom logic within xarray.

Mea culpa. I wasn't clear enough about the intent from my side at least, namely to support loading ARRAY_DIMENSIONS (or some other necessary subset) from nczarr rather than its entirety.

DennisHeimbigner commented 4 hours ago this is because xarray cannot handle subgroups.

I'll add as a side that work on the subgroups (i.e. datatree) is progressing in case any consideration needs to be included now rather than later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076821132 https://github.com/pydata/xarray/issues/6374#issuecomment-1076821132 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85ALvyM DennisHeimbigner 905179 2022-03-23T21:07:01Z 2022-03-23T21:07:01Z NONE

I guess I was not clear. If you are willing to lose netcdf specific metadata, then I believe any xarray or zarr implementation should be able to read nczarr written data with no changes needed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076810559 https://github.com/pydata/xarray/issues/6374#issuecomment-1076810559 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85ALtM_ rabernat 1197350 2022-03-23T20:54:39Z 2022-03-23T20:54:39Z MEMBER

Sure, to be clear, my hesitancy is mostly just around being reluctant to maintain more complexity in our zarr interface. If there is momentum to implement and maintain this compatibility, I am definitely not opposed. 🚀

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076796582 https://github.com/pydata/xarray/issues/6374#issuecomment-1076796582 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85ALpym shoyer 1217238 2022-03-23T20:38:12Z 2022-03-23T20:38:12Z MEMBER

@DennisHeimbigner I think it would be great to standardize NCZarr as a super-set of the "Xarray-Zarr" standard! I think Xarray should indeed be able to read such files. If you want to read a sub-group, you can read the sub-group in a separate call to xarray.open_zarr().

@rabernat I would not be opposed to adding support inside Xarray for reading NCZarr data, specifically to understand NCZarr's encoding of dimension names when using Zarr-Python. This wouldn't give 100% compatibility with NCZarr, but it would be very close (maybe just with incorrect dtypes for attributes) with a minimal amount of work. I don't think it would be a big deal to look for .nczvar files.

{
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076777717 https://github.com/pydata/xarray/issues/6374#issuecomment-1076777717 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85ALlL1 DennisHeimbigner 905179 2022-03-23T20:15:18Z 2022-03-23T20:15:18Z NONE

As the moment, NCzarr format files (as opposed to pure Zarr format files produced by NCZarr) do not include the Xarray _ARRAY_DIMENSIONS attribute. Now that I think about it, there is no reason not to include that attribute where it is meaningful, so I will make that change. After that change, the situation should be as follows:

Xarray can read any nczarr format file subject to the following conditions:
1. xarray attempts to read only the root group and ignores subgroups
    * this is because xarray cannot handle subgroups.
2. the xarray implementation ignores extra dictionary keys in e.g. .zarray and .zattr
   that it does not recognize
    * this should already be the case under the principle of "read broadly, write narrowly".
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076622767 https://github.com/pydata/xarray/issues/6374#issuecomment-1076622767 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AK_Wv rabernat 1197350 2022-03-23T17:39:57Z 2022-03-23T17:39:57Z MEMBER

My opinion is that we should not try to support the nczarr conventions directly. Xarray already supports nczarr via netCDF4. If netCDF4 can open the Zarr store, then Xarray can read it.

Supporting nczarr directly would require lots of custom logic within xarray. That's because nczarr introduces several additional metadata files that are not part of the zarr spec. These additional metadata files break the abstractions through which xarray interacts with zarr; working around this requires going under the hood, access the store object directly (rather than the zarr groups and arrays).

I would turn this question around and ask: if netCDF4 supports access to these datasets directly, what's the advantage of xarray bypassing netCDF4 and opening them directly? If there are significant performance benefits, I would be more likely to consider it worthwhile.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076601482 https://github.com/pydata/xarray/issues/6374#issuecomment-1076601482 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AK6KK joshmoore 88113 2022-03-23T17:22:23Z 2022-03-23T17:22:23Z NONE

Thanks, @max-sixty! Guess it just doesn't complete for those outside the org.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076592711 https://github.com/pydata/xarray/issues/6374#issuecomment-1076592711 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AK4BH max-sixty 5635139 2022-03-23T17:14:30Z 2022-03-23T17:14:30Z MEMBER

CC @pydata/xarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1076544302 https://github.com/pydata/xarray/issues/6374#issuecomment-1076544302 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X85AKsMu joshmoore 88113 2022-03-23T16:31:53Z 2022-03-23T16:31:53Z NONE

Thanks for the details, @DennisHeimbigner. But my reading of what you outline is that for some nczarr datasets, xarray will be able to open them. Correct? If so, there were always likely to be follow-on's to this issue when/if we identify critical edge cases. Perhaps for the moment, though, we can focus here on what we want to enable and what can be done straight-forwardly.

That likely makes this more a question for @shoyer, @jhamman, @rabernat et al. (sorry, no way to @-mention all the current devs) @malmans2 is probably in a good place to start updating the existing Zarr backend to also check for the nczarr files, but if there are strong opinions against or alternatives that would be preferred, it would be good to hear about them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1071720621 https://github.com/pydata/xarray/issues/6374#issuecomment-1071720621 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X84_4Sit DennisHeimbigner 905179 2022-03-17T22:47:59Z 2022-03-17T22:47:59Z NONE

For Unidata and netcdf, I think the situation is briefly this.

In netcdf-4, dimensions are named objects that can "reside" inside groups. So for example we might have this: netcdf example { dimensions: x=1; y=10; z=20; group g1 { dimensions: a=1; y=10; z=5; variables: float v(/x, /g1/y, /z); } } So base dimension names (e.g. "z") can occur in different groups and can represent different dimension objects (with different sizes).

It is possible to reference any dimension using fully-qualified-names (FQNs) such as "/g1/y". This capability is important so that, for example, related dimensions can be isolated with a group.

NCZarr captures this information by recording fully qualified names as special keys. This differs from XArray where fully qualified names are not supported. From the netcdf point of view, it is as if all dimension objects were declared in the root group.

If XArray is to be extended to support the equivalent of groups and distinct sets of dimensions are going to be supported in different groups, then some equivalent of the netcdf FQN is going to be needed.

One final note. In netcdf, the dimension size is declared once and associated with a name. In zarr/xarray, the size occurs in multiple places (via the "shape" key) and the name-size associated is also declared multlple times via the _ARRAY_DIMENSIONS attribute.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856
1071517967 https://github.com/pydata/xarray/issues/6374#issuecomment-1071517967 https://api.github.com/repos/pydata/xarray/issues/6374 IC_kwDOAMm_X84_3hEP dopplershift 221526 2022-03-17T21:30:22Z 2022-03-17T21:30:22Z CONTRIBUTOR

Cc @WardF @DennisHeimbigner @haileyajohnson

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should the zarr backend support NCZarr conventions? 1172229856

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.489ms · About: xarray-datasette