home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 1720045908 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • kmuehlbauer 7
  • tomwhite 4
  • headtr1ck 2
  • dcherian 1

author_association 3

  • MEMBER 8
  • CONTRIBUTOR 4
  • COLLABORATOR 2

issue 1

  • CF encoding should preserve vlen dtype for empty arrays · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1578777785 https://github.com/pydata/xarray/pull/7862#issuecomment-1578777785 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85eGjy5 headtr1ck 43316012 2023-06-06T13:31:34Z 2023-06-06T13:31:34Z COLLABORATOR

If you want you can leave a comment But the mypy CI should fail on unused ignores, so we will notice :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1578775636 https://github.com/pydata/xarray/pull/7862#issuecomment-1578775636 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85eGjRU kmuehlbauer 5821660 2023-06-06T13:30:15Z 2023-06-06T13:30:15Z MEMBER

Might be worth an issue over at numpy with the example from the test.

numpy/numpy#23886

The issue is already resolved over at numpy which is really great! It was also marked as backport. @headtr1ck How are these issues resolved currently or how do we track removing the ignore?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1578248748 https://github.com/pydata/xarray/pull/7862#issuecomment-1578248748 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85eEios kmuehlbauer 5821660 2023-06-06T09:04:39Z 2023-06-06T09:04:39Z MEMBER

Might be worth an issue over at numpy with the example from the test.

https://github.com/numpy/numpy/issues/23886

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1574278204 https://github.com/pydata/xarray/pull/7862#issuecomment-1574278204 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85d1ZQ8 headtr1ck 43316012 2023-06-02T20:27:46Z 2023-06-02T20:28:29Z COLLABORATOR

This seems to be a numpy issue, mypy thinks that you cannot call np.dtype like you do.

Might be worth an issue over at numpy with the example from the test.

For now we can simply ignore this error.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1574264842 https://github.com/pydata/xarray/pull/7862#issuecomment-1574264842 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85d1WAK dcherian 2448579 2023-06-02T20:14:33Z 2023-06-02T20:14:48Z MEMBER

xarray/tests/test_coding_strings.py:36: error: No overload variant of "dtype" matches argument types "str", "Dict[str, Type[str]]" [call-overload]

cc @Illviljan @headtr1ck

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1573764660 https://github.com/pydata/xarray/pull/7862#issuecomment-1573764660 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dzb40 tomwhite 85085 2023-06-02T13:44:43Z 2023-06-02T13:44:43Z CONTRIBUTOR

@kmuehlbauer thanks for adding tests! I'm not sure what the mypy error is either, I'm afraid...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1572021301 https://github.com/pydata/xarray/pull/7862#issuecomment-1572021301 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dsyQ1 kmuehlbauer 5821660 2023-06-01T13:06:32Z 2023-06-01T13:06:32Z MEMBER

@tomwhite I've added tests to check the backend code for vlen string dtype metadadata. Also had to add specific check for the h5py vlen string metadata. I think we've covered everything for the proposed change to allow empty vlen strings dtype metadata.

I'm looking at the mypy error and do not have the slightest clue what and where to change. Any help appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561308333 https://github.com/pydata/xarray/pull/7862#issuecomment-1561308333 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dD6yt tomwhite 85085 2023-05-24T14:51:23Z 2023-05-24T14:51:23Z CONTRIBUTOR

So it looks like the changes here with the fix in my branch will get your issue resolved @tomwhite, right?

Yes - thanks!

I'm a bit worried, that this might break other users workflows, if they depend on the current conversion to floating point for some reason.

The floating point default is preserved if you do e.g. xr.Dataset({"a": np.array([], dtype=object)}). The change here will only convert to string if there is extra metadata present that says it is a string.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561285499 https://github.com/pydata/xarray/pull/7862#issuecomment-1561285499 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dD1N7 kmuehlbauer 5821660 2023-05-24T14:37:58Z 2023-05-24T14:37:58Z MEMBER

Thanks for trying. I can't think of any downsides for the netcdf4-fix, as it just adds the needed metadata to the object-dtype. But you never know, so it would be good to get another set of eyes on it.

So it looks like the changes here with the fix in my branch will get your issue resolved @tomwhite, right?

I'm a bit worried, that this might break other users workflows, if they depend on the current conversion to floating point for some reason. Also other backends might rely on this feature. Especially because this has been there since the early days when xarray was known as xray.

@dcherian What would be the way to go here?

There is also a somehow contradicting issue in #7868.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561240314 https://github.com/pydata/xarray/pull/7862#issuecomment-1561240314 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dDqL6 tomwhite 85085 2023-05-24T14:12:49Z 2023-05-24T14:12:49Z CONTRIBUTOR

Could you verify the above example, please?

The code looks fine, and I get the same result when I run it with this PR.

Your fix in https://github.com/kmuehlbauer/xarray/tree/preserve-vlen-string-dtype changes the metadata so it is correctly preserved as metadata: {'element_type': <class 'str'>}.

I feel less qualified to evaluate the impact of the netcdf4 fix.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561195832 https://github.com/pydata/xarray/pull/7862#issuecomment-1561195832 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dDfU4 kmuehlbauer 5821660 2023-05-24T13:52:04Z 2023-05-24T13:52:04Z MEMBER

@tomwhite I've put a commit with changes to zarr/netcdf4-backends which should preserve the dtype metadata here: https://github.com/kmuehlbauer/xarray/tree/preserve-vlen-string-dtype.

I'm not really sure if that is the right location, but as it was already present that location at netcdf4-backend I think it will do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561162311 https://github.com/pydata/xarray/pull/7862#issuecomment-1561162311 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dDXJH kmuehlbauer 5821660 2023-05-24T13:32:26Z 2023-05-24T13:32:57Z MEMBER

@tomwhite Special casing on netcdf4 backend should be possible, too.

But it might need fixing at zarr backend, too:

python ds = xr.Dataset({"a": np.array([], dtype=xr.coding.strings.create_vlen_dtype(str))}) print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ds.to_zarr("a.zarr") print("\n### Loading ###") with xr.open_dataset("a.zarr", engine="zarr") as ds: print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ```python dtype: object metadata: {'element_type': <class 'str'>}

Loading

dtype: object metadata: None ```

Could you verify the above example, please? I'm relatively new to zarr :grimacing:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1561143111 https://github.com/pydata/xarray/pull/7862#issuecomment-1561143111 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dDSdH tomwhite 85085 2023-05-24T13:23:18Z 2023-05-24T13:23:18Z CONTRIBUTOR

Thanks for taking a look @kmuehlbauer and for the useful example code. I hadn't considered the netcdf cases, so thanks for pointing those out.

Engine netcdf4 does not roundtrip here, losing the dtype metadata information. There is special casing for h5netcdf backend, though.

Could netcdf4 do the same special-casing as h5netcdf?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908
1560559426 https://github.com/pydata/xarray/pull/7862#issuecomment-1560559426 https://api.github.com/repos/pydata/xarray/issues/7862 IC_kwDOAMm_X85dBD9C kmuehlbauer 5821660 2023-05-24T07:01:44Z 2023-05-24T07:01:44Z MEMBER

Thanks @tomwhite for the PR. I've only quickly checked the approach, which looks reasonable. But those changes have implications on several locations of the backend code, which we would have to sort out.

Considering this example:

```python import numpy as np import xarray as xr print(f"creating dataset with empty string array") print("-----------------------------------------") dtype = xr.coding.strings.create_vlen_dtype(str) ds = xr.Dataset({"a": np.array([], dtype=dtype)}) print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ds.to_netcdf("a.nc", engine="netcdf4")

print("\nncdump") print("-------") !ncdump a.nc

engines = ["netcdf4", "h5netcdf"] for engine in engines: with xr.open_dataset("a.nc", engine=engine) as ds: print(f"\nloading with {engine}") print("-------------------") print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ```

```python creating dataset with empty string array


dtype: object metadata: {'element_type': <class 'str'>}

ncdump

netcdf a { dimensions: a = UNLIMITED ; // (0 currently) variables: string a(a) ; data: }

loading with netcdf4

dtype: object metadata: None

loading with h5netcdf

dtype: object metadata: {'vlen': <class 'str'>} ```

Engine netcdf4 does not roundtrip here, losing the dtype metadata information. There is special casing for h5netcdf backend, though.

The source is actually located in open_store_variable of netcdf4 backend, when the underlying data is converted to Variable (which does some object dtype twiddling).

Unfortunately I do not have an immediate solution here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CF encoding should preserve vlen dtype for empty arrays 1720045908

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1031.835ms · About: xarray-datasette