home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "MEMBER" and issue = 350899839 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • rabernat 5
  • shoyer 2
  • TomNicholas 2
  • dcherian 1
  • benbovy 1
  • fmaussion 1
  • andersy005 1

issue 1

  • Let's list all the netCDF files that xarray can't open · 13 ✖

author_association 1

  • MEMBER · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1382671908 https://github.com/pydata/xarray/issues/2368#issuecomment-1382671908 https://api.github.com/repos/pydata/xarray/issues/2368 IC_kwDOAMm_X85SaeYk TomNicholas 35968931 2023-01-14T06:10:39Z 2023-01-14T06:10:39Z MEMBER

@ronygolderku thanks for your example. Looks like it fails for the same reason as was mentioned for some of the other examples above.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
1320994455 https://github.com/pydata/xarray/issues/2368#issuecomment-1320994455 https://api.github.com/repos/pydata/xarray/issues/2368 IC_kwDOAMm_X85OvMaX andersy005 13301940 2022-11-19T23:53:57Z 2022-11-19T23:54:43Z MEMBER

@maxaragon, i'm curious. what version of xarray/netcdf4 are you using? i'm asking because this appears to be working fine on my end

```python In [1]: import xarray as xr

In [2]: ds = xr.open_dataset("20200825_hyytiala_icon-iglo-12-23.nc")

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 25, level: 90, flux_level: 91, frequency: 2, soil_level: 9) Coordinates: * time (time) datetime64[ns] 2020-08-25 ... 2020-0... * level (level) float32 90.0 89.0 88.0 ... 3.0 2.0 1.0 * flux_level (flux_level) float32 91.0 90.0 ... 2.0 1.0 * frequency (frequency) float32 34.96 94.0 Dimensions without coordinates: soil_level Data variables: (12/62) latitude float32 ... longitude float32 ... altitude float32 ... horizontal_resolution float32 ... forecast_time (time) timedelta64[ns] ... height (time, level) float32 ... ... ... gas_atten (frequency, time, level) float32 ... specific_gas_atten (frequency, time, level) float32 ... specific_saturated_gas_atten (frequency, time, level) float32 ... specific_dry_gas_atten (frequency, time, level) float32 ... K2 (frequency, time, level) float32 ... specific_liquid_atten (frequency, time, level) float32 ... Attributes: (12/13) institution: Max Planck Institute for Meteorology/Deutscher Wette... references: see MPIM/DWD publications source: svn://xceh.dwd.de/for0adm/SVN_icon/tags/icon-2.6.0-n... Conventions: CF-1.7 location: hyytiala file_uuid: ace15f8ba477497c8d1dd0833b5ac674 ... ... year: 2020 month: 08 day: 25 history: 2021-01-25 08:24:29 - File content harmonized by the... title: Model file from Hyytiala pid: https://hdl.handle.net/21.12132/1.ace15f8ba477497c ```

here are the versions i'm using

```python In [4]: xr.show_versions() /Users/andersy005/mambaforge/envs/playground/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.10.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.9.3 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.3 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.10.2 distributed: 2022.10.2 matplotlib: 3.6.1 cartopy: None seaborn: 0.12.0 numbagg: None fsspec: 2022.10.0 cupy: None pint: 0.20.1 sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3 conda: None pytest: None IPython: 8.6.0 sphinx: None ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
1006639506 https://github.com/pydata/xarray/issues/2368#issuecomment-1006639506 https://api.github.com/repos/pydata/xarray/issues/2368 IC_kwDOAMm_X848ABmS benbovy 4160723 2022-01-06T14:36:12Z 2022-01-06T14:36:12Z MEMBER

@TomNicholas yes with the explicit index refactor we should be able to relax the 1D coordinate / dimension matching name constraint in the Xarray data model.

I'm sure there are some cases internally where we currently rely on this assumption, but it should be relatively easy to relax.

I also initially thought it would be easy to relax, but I'm not so sure anymore. I don't think it is a hard task, but it might still require some fair amount of work. I've already refactored a bunch of such internal cases in #5692, but there's a good chance that some (not sure how many) cases will still need a fix.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
1005916151 https://github.com/pydata/xarray/issues/2368#issuecomment-1005916151 https://api.github.com/repos/pydata/xarray/issues/2368 IC_kwDOAMm_X8479Q_3 TomNicholas 35968931 2022-01-05T17:14:35Z 2022-01-05T17:14:35Z MEMBER

Currently, xarray requires that variables with a name matching a dimension are 1D variables along that dimension, e.g.,

python for dim in dataset.dims: if dim in dataset.variables: assert dataset.variables[dim].dims == (dim,)

I agree that this unnecessarily complicates our data model. There's no particular advantage to this invariant, besides removing the need to check the dimensions of variables used for indexing lookups. I'm sure there are some cases internally where we currently rely on this assumption, but it should be relatively easy to relax.

It seems like this relaxation is compatible with the refactoring of indexes.

@benbovy will the explicit indexes refactor fix this case?

This is mentioned elsewhere (can't find the issue right now) and may be out of scope for this issue but I'm going to say it anyway: opening a NetCDF file with groups was not as easy as I wanted it to be when first starting out with xarray.

@djhoese For anything to do with opening netCDF files with groups see #4118 and the linked issues from there.

If people have example of other weird cases involving groups (like groups within themselves or anything like that) then I would be interested to have those files to test with!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
785334802 https://github.com/pydata/xarray/issues/2368#issuecomment-785334802 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDc4NTMzNDgwMg== dcherian 2448579 2021-02-24T19:58:16Z 2021-02-24T19:58:16Z MEMBER

Clearly we can detect this failure, so shall we rename the date dimension to date_ in this example? We can raise a warning saying round-tripping will not work for such datasets

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
443305634 https://github.com/pydata/xarray/issues/2368#issuecomment-443305634 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQ0MzMwNTYzNA== rabernat 1197350 2018-11-30T19:03:07Z 2018-11-30T19:03:07Z MEMBER

We are working on fixing this in #2405. That PR (mine) has most of the basic functionality there, but it still needs more testing. Unfortunately, I don't have bandwidth right now to complete the required work.

If anyone here needs this fixed urgently and actually has time to work on it, I encourage you to pick up that PR and try to finish it off. We will be happy to provide help and support along the way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419212304 https://github.com/pydata/xarray/issues/2368#issuecomment-419212304 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTIxMjMwNA== shoyer 1217238 2018-09-06T19:24:05Z 2018-09-06T19:24:05Z MEMBER

Or no index at all?

This would be my inclination (for the default behavior). It would mean that you could not longer count on anyways being able to do labeled indexing along each dimension, but in the broader scheme of things I don't think that's a big deal.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419207959 https://github.com/pydata/xarray/issues/2368#issuecomment-419207959 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTIwNzk1OQ== rabernat 1197350 2018-09-06T19:08:06Z 2018-09-06T19:08:06Z MEMBER

It seems like this relaxation is compatible with the refactoring of indexes. Right now, we automatically create 1D indexes for all coordinate variables. The problem with 2D dimensions is that such indexes don't make sense: data.sel(y=3.14)

But maybe we could turn multi-dimensional coordinate variables into multi-indexes? Or no index at all? In any case, we could still do data.isel(y=3) i.e. logical indexing on the dimension axis.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419202871 https://github.com/pydata/xarray/issues/2368#issuecomment-419202871 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTIwMjg3MQ== shoyer 1217238 2018-09-06T18:51:11Z 2018-09-06T18:51:11Z MEMBER

Currently, xarray requires that variables with a name matching a dimension are 1D variables along that dimension, e.g., python for dim in dataset.dims: if dim in dataset.variables: assert dataset.variables[dim].dims == (dim,)

I agree that this unnecessarily complicates our data model. There's no particular advantage to this invariant, besides removing the need to check the dimensions of variables used for indexing lookups. I'm sure there are some cases internally where we currently rely on this assumption, but it should be relatively easy to relax.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419188538 https://github.com/pydata/xarray/issues/2368#issuecomment-419188538 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTE4ODUzOA== rabernat 1197350 2018-09-06T18:05:00Z 2018-09-06T18:05:00Z MEMBER

Perhaps part of the confusion is simply that y has different meanings in different contexts. When used as a dimension (e.g. to "define the array shape of a Variable" in CDM terms), it is indeed 1D. When used as a variable (or "CoordinateAxis"), it is 2D. XArray doesn't have a separate namespace for dimensions and variables.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419187692 https://github.com/pydata/xarray/issues/2368#issuecomment-419187692 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTE4NzY5Mg== rabernat 1197350 2018-09-06T18:02:19Z 2018-09-06T18:02:19Z MEMBER

@dopplershift - thanks for the clarifications! I agree that it's good for netCDF to be as open-ended as possible.

So I guess my quarrel is with the CDM. This is what it says about variables and dimensions:

A Variable is a container for data. It has a DataType, a set of Dimensions that define its array shape, and optionally a set of Attributes. Any shared Dimension it uses must be in the same Group or a parent Group.

A Dimension is used to define the array shape of a Variable. It may be shared among Variables, which provides a simple yet powerful way of associating Variables. When a Dimension is shared, it has a unique name within the Group. If unlimited, a Dimension's length may increase. If variableLength, then the actual length is data dependent, and can only be found by reading the data. A variableLength Dimension cannot be shared or unlimited.

then later

A Variable can have zero or more Coordinate Systems containing one or more CoordinateAxis. A CoordinateAxis can only be part of a Variable's CoordinateSystem if the CoordinateAxis' set of Dimensions is a subset of the Variable's set of Dimensions. This ensures that every data point in the Variable has a corresponding coordinate value for each of the CoordinateAxis in the CoordinateSystem.

A Coordinate System has one or more CoordinateAxis, and zero or more CoordinateTransforms.

A CoordinateAxis is a subtype of Variable, and is optionally classified according to the types in AxisType.

These are the rules which restrict which Variables can be used as Coordinate Axes:

Shared Dimensions: All dimensions used by a Coordinate Axis must be shared with the data variable. When a variable is part of a Structure, the dimensions used by the parent Structure(s) are considered to be part of the nested Variable.

I have a very hard time understanding what all of this means. Can the same variable be a "Dimension" and a "CoordinateAxis" in CDM?

It seems much simpler to me to use the CF approach to describe the physical coordinates of the data using "auxiliary coordinate variables" and to keep the dimensions as purely 1D "coordinate variables".

IMO, xarray is being overly pedantic here.

What would you like xarray to do with these datasets, given the fact that orthogonality of dimensions is central to its data model?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
419160841 https://github.com/pydata/xarray/issues/2368#issuecomment-419160841 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTE2MDg0MQ== rabernat 1197350 2018-09-06T16:37:51Z 2018-09-06T16:37:51Z MEMBER

@djhoese - it would be great if you could track down a more specific example of the issue you are referring to.

Excluding this possible problem with groups, my assessment of the feedback above is that, actually, the only problem is #2233: we can't have multidimensional variables that are also their own dimensions. This is a good thing. It means we have a specific problem to fix.

Right now this is ok: dimensions: x = 4 y = 3 variables: int x(x); int y(y); float data(y, x) But this is not dimensions: x = 4 y = 3 variables: int x(x); float y(y, x); float data(y, x)

Personally I find this to be an incredibly confusing, recursive use of the concept of "dimensions". For me, dimensions should be orthogonal. In the second example, y is a [non-dimension] coordinate, not a dimension! The actual dimension is implicit, some sort of logical y_index. I wish that CF / netCDF had never chosen to accept this as a valid schema. But I admit that perhaps my internal mental model is too wrapped up with xarray!

So the question is: what can we do about it?

I propose the following general outline: - Create a new decoding function to effectively "fix" the recursively defined dimension by renaming y(y, x) into something like y_coordinate(y, x) - Add a new option to open_dataset called decode_recursive_dimension which defaults to False - Raise a more informative error when these types of datasets are encountered which suggests calling open_dataset with decode_recursive_dimension=True

Finally, we might want to raise this upstream with netCDF or CF conventions to try to understand better why this sort of schema is being encouraged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
413545600 https://github.com/pydata/xarray/issues/2368#issuecomment-413545600 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxMzU0NTYwMA== fmaussion 10050469 2018-08-16T13:29:33Z 2018-08-16T13:29:33Z MEMBER

The two examples by @dopplershift are the same problem as in https://github.com/pydata/xarray/issues/2233

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.747ms · About: xarray-datasette