html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/7862#issuecomment-1578775636,https://api.github.com/repos/pydata/xarray/issues/7862,1578775636,IC_kwDOAMm_X85eGjRU,5821660,2023-06-06T13:30:15Z,2023-06-06T13:30:15Z,MEMBER,"> > Might be worth an issue over at numpy with the example from the test.
> 
> [numpy/numpy#23886](https://github.com/numpy/numpy/issues/23886)

The issue is already resolved over at numpy which is really great! It was also marked as backport. @headtr1ck How are these issues resolved currently or how do we track removing the ignore?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1578248748,https://api.github.com/repos/pydata/xarray/issues/7862,1578248748,IC_kwDOAMm_X85eEios,5821660,2023-06-06T09:04:39Z,2023-06-06T09:04:39Z,MEMBER,"> Might be worth an issue over at numpy with the example from the test.

https://github.com/numpy/numpy/issues/23886","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1574264842,https://api.github.com/repos/pydata/xarray/issues/7862,1574264842,IC_kwDOAMm_X85d1WAK,2448579,2023-06-02T20:14:33Z,2023-06-02T20:14:48Z,MEMBER,"> xarray/tests/test_coding_strings.py:36: error: No overload variant of ""dtype"" matches argument types ""str"", ""Dict[str, Type[str]]""  [call-overload]


cc @Illviljan @headtr1ck ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1572021301,https://api.github.com/repos/pydata/xarray/issues/7862,1572021301,IC_kwDOAMm_X85dsyQ1,5821660,2023-06-01T13:06:32Z,2023-06-01T13:06:32Z,MEMBER,"@tomwhite I've added tests to check the backend code for vlen string dtype metadadata. Also had to add specific check for the h5py vlen string metadata. I think we've covered everything for the proposed change to allow empty vlen strings dtype metadata.

I'm looking at the mypy error and do not have the slightest clue what and where to change. Any help appreciated.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1561285499,https://api.github.com/repos/pydata/xarray/issues/7862,1561285499,IC_kwDOAMm_X85dD1N7,5821660,2023-05-24T14:37:58Z,2023-05-24T14:37:58Z,MEMBER,"Thanks for trying. I can't think of any downsides for the netcdf4-fix, as it just adds the needed metadata to the object-dtype. But you never know, so it would be good to get another set of eyes on it.

So it looks like the changes here with the fix in my branch will get your issue resolved @tomwhite, right? 

I'm a bit worried, that this might break other users workflows, if they depend on the current conversion to floating point for some reason. Also other backends might rely on this feature. Especially because this has been there since the early days when xarray was known as xray.

@dcherian What would be the way to go here? 

There is also a somehow contradicting issue in #7868.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1561195832,https://api.github.com/repos/pydata/xarray/issues/7862,1561195832,IC_kwDOAMm_X85dDfU4,5821660,2023-05-24T13:52:04Z,2023-05-24T13:52:04Z,MEMBER,"@tomwhite I've put a commit with changes to zarr/netcdf4-backends which should preserve the dtype metadata here: https://github.com/kmuehlbauer/xarray/tree/preserve-vlen-string-dtype.

I'm not really sure if that is the right location, but as it was already present that location at netcdf4-backend I think it will do.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1561162311,https://api.github.com/repos/pydata/xarray/issues/7862,1561162311,IC_kwDOAMm_X85dDXJH,5821660,2023-05-24T13:32:26Z,2023-05-24T13:32:57Z,MEMBER,"@tomwhite Special casing on netcdf4 backend should be possible, too.

But it might need fixing at zarr backend, too:

```python
ds = xr.Dataset({""a"": np.array([], dtype=xr.coding.strings.create_vlen_dtype(str))})
print(f""dtype: {ds['a'].dtype}"")
print(f""metadata: {ds['a'].dtype.metadata}"")
ds.to_zarr(""a.zarr"")
print(""\n### Loading ###"")
with xr.open_dataset(""a.zarr"", engine=""zarr"") as ds:
    print(f""dtype: {ds['a'].dtype}"")
    print(f""metadata: {ds['a'].dtype.metadata}"")
```
```python
dtype: object
metadata: {'element_type': <class 'str'>}

### Loading ###
dtype: object
metadata: None
```

Could you verify the above example, please? I'm relatively new to `zarr` :grimacing: ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908
https://github.com/pydata/xarray/pull/7862#issuecomment-1560559426,https://api.github.com/repos/pydata/xarray/issues/7862,1560559426,IC_kwDOAMm_X85dBD9C,5821660,2023-05-24T07:01:44Z,2023-05-24T07:01:44Z,MEMBER,"Thanks @tomwhite for the PR. I've only quickly checked the approach, which looks reasonable. But those changes have implications on several locations of the backend code, which we would have to sort out.

Considering this example:

```python
import numpy as np
import xarray as xr
print(f""creating dataset with empty string array"")
print(""-----------------------------------------"")
dtype = xr.coding.strings.create_vlen_dtype(str)
ds = xr.Dataset({""a"": np.array([], dtype=dtype)})
print(f""dtype: {ds['a'].dtype}"")
print(f""metadata: {ds['a'].dtype.metadata}"")
ds.to_netcdf(""a.nc"", engine=""netcdf4"")

print(""\nncdump"")
print(""-------"")
!ncdump a.nc

engines = [""netcdf4"", ""h5netcdf""]
for engine in engines:
    with xr.open_dataset(""a.nc"", engine=engine) as ds:
        print(f""\nloading with {engine}"")
        print(""-------------------"")
        print(f""dtype: {ds['a'].dtype}"")
        print(f""metadata: {ds['a'].dtype.metadata}"")
```

```python
creating dataset with empty string array
-----------------------------------------
dtype: object
metadata: {'element_type': <class 'str'>}

ncdump
-------
netcdf a {
dimensions:
	a = UNLIMITED ; // (0 currently)
variables:
	string a(a) ;
data:
}

loading with netcdf4
-------------------
dtype: object
metadata: None

loading with h5netcdf
-------------------
dtype: object
metadata: {'vlen': <class 'str'>}
```

Engine `netcdf4` does not roundtrip here, losing the dtype metadata information. There is special casing for h5netcdf backend, though. 

The source is actually located in `open_store_variable` of `netcdf4` backend, when the underlying data is converted to `Variable` (which does some object dtype twiddling).

Unfortunately I do not have an immediate solution here. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1720045908