html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3476#issuecomment-1485362670,https://api.github.com/repos/pydata/xarray/issues/3476,1485362670,IC_kwDOAMm_X85YiNXu,24508496,2023-03-27T15:40:06Z,2023-03-27T15:40:06Z,CONTRIBUTOR,"My impression is that keeping the zarr encoding leads to a bunch of issues (see my issue above) or the current one. There also seems to be an issue with chunking being preserved because the array encodings arent overwritten, but I cant find that issue right now. Since all of these issues are resolved by popping the zarr encoding I am wondering what are the downsides of this and whether it'd be easier to not keep that encoding at all?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-1115045538,https://api.github.com/repos/pydata/xarray/issues/3476,1115045538,IC_kwDOAMm_X85Cdj6i,3698640,2022-05-02T15:38:11Z,2022-08-08T15:32:52Z,CONTRIBUTOR,"This has been happening a lot to me lately when writing to zarr. Thanks to @bolliger32 for the tip - this usually works like a charm:
```python
for v in list(ds.coords.keys()):
if ds.coords[v].dtype == object:
ds.coords[v] = ds.coords[v].astype(""unicode"")
for v in list(ds.variables.keys()):
if ds[v].dtype == object:
ds[v] = ds[v].astype(""unicode"")
```
For whatever reason, clearing the encoding and/or using `.astype(str)` doesn't seem to work as reliably. I don't have a good MRE for this but hope it helps others with the same issue.
note the flag raised by @FlorisCalkoen [below](https://github.com/pydata/xarray/issues/3476#issuecomment-1205346130) - don't just throw this at all your writes! there are other object types (e.g. CFTime) which you probably don't want to convert to string. This is just a patch to get around this issue for dataarrays with string coords/variables.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-1208280450,https://api.github.com/repos/pydata/xarray/issues/3476,1208280450,IC_kwDOAMm_X85IBOWC,3698640,2022-08-08T15:30:52Z,2022-08-08T15:31:13Z,CONTRIBUTOR,ha - yeah that's a good flag. I definitely didn't intend for that to be a universally applied patch! so probably should have included a buyer beware. but we did find that clearing the encoding doesn't always do the trick for string arrays. So a comprehensive patch will probably need to be more nuanced.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-1205346130,https://api.github.com/repos/pydata/xarray/issues/3476,1205346130,IC_kwDOAMm_X85H2B9S,44444001,2022-08-04T14:36:26Z,2022-08-04T14:36:26Z,NONE,"@delgadom I just noticed that your proposed solution has the side effect of also converting `cftime` objects (e.g., below) to unicode strings.
```
xarray.DataArray 'time' (time: 1) array([cftime.DatetimeNoLeap(2007, 7, 2, 12, 0, 0, 0, has_year_zero=True)], dtype=object)
```
I updated your lines using @Hoeze' clear function and that seems to work for now.
```[python]
for v in list(ds.coords.keys()):
if ds.coords[v].dtype == object:
ds[v].encoding.clear()
for v in list(ds.variables.keys()):
if ds[v].dtype == object:
ds[v].encoding.clear()
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-1012775723,https://api.github.com/repos/pydata/xarray/issues/3476,1012775723,IC_kwDOAMm_X848Xbsr,72196131,2022-01-14T04:55:03Z,2022-01-14T04:55:03Z,CONTRIBUTOR,Think it worked with one variable - @Hoeze workaround was necessary for more than one.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-1012775218,https://api.github.com/repos/pydata/xarray/issues/3476,1012775218,IC_kwDOAMm_X848Xbky,17399794,2022-01-14T04:53:36Z,2022-01-14T04:53:36Z,NONE,"Thanks for that, looks like I just came across this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-841692487,https://api.github.com/repos/pydata/xarray/issues/3476,841692487,MDEyOklzc3VlQ29tbWVudDg0MTY5MjQ4Nw==,1200058,2021-05-15T16:56:00Z,2021-05-15T17:03:00Z,NONE,"Hi, I also keep running into this issue all the time.
Right now, there is no way of round-tripping `xr.open_zarr().to_zarr()`, also because of https://github.com/pydata/xarray/issues/5219.
The only workaround that seems to help is the following:
```python
to_store = xrds.copy()
for var in to_store.variables:
to_store[var].encoding.clear()
```","{""total_count"": 3, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-746550766,https://api.github.com/repos/pydata/xarray/issues/3476,746550766,MDEyOklzc3VlQ29tbWVudDc0NjU1MDc2Ng==,2443309,2020-12-16T16:09:52Z,2020-12-16T16:09:52Z,MEMBER,Thanks @andersy005 for the write up and for digging into this. Doesn't this seem like it could be a bug in Zarr's `create` method? ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-745393893,https://api.github.com/repos/pydata/xarray/issues/3476,745393893,MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw==,13301940,2020-12-15T16:11:39Z,2020-12-15T16:11:39Z,MEMBER,"> I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode
Indeed. And it appears that these lines are the culprits:
https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471
```python
ipdb> v
array(['a', 'b', 'c'], dtype=object)
ipdb> check
False
ipdb> vn
'x'
ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn)
ipdb> encoding
{'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]}
```
https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482
Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters:
```python
ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding)
ipdb> self.ds['x']._meta
{'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]}
```
```python
ipdb> self.ds['x']._meta['filters']
[{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]
```
As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works:
```python
ipdb> enc # without filters
{'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []}
ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **enc)
ipdb> self.ds['x']._meta['filters']
[{'id': 'vlen-utf8'}]
```
@jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information? ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-630863198,https://api.github.com/repos/pydata/xarray/issues/3476,630863198,MDEyOklzc3VlQ29tbWVudDYzMDg2MzE5OA==,624352,2020-05-19T14:38:26Z,2020-05-19T14:38:26Z,NONE,"I ran into the same issue. It seems like zarr is inserting `VLenUTF8` as a filter, but the loaded data array already has that as a filter so it's trying to double encode . So another workaround is to delete the filters on `site_code`:
```
import xarray as xr
sm_from_zarr = xr.open_zarr('tmp/test_sm_zarr')
del sm_from_zarr.site_code.encoding[""filters""]
sm_from_zarr.to_zarr('tmp/test_sm_zarr_from', mode='w')
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-620090096,https://api.github.com/repos/pydata/xarray/issues/3476,620090096,MDEyOklzc3VlQ29tbWVudDYyMDA5MDA5Ng==,11967971,2020-04-27T16:22:35Z,2020-04-27T16:22:35Z,NONE,"I'm experiencing the same issue, which seems to be also related to one of my coordinates having object as datatype. Luckily, the workaround proposed by @jsadler2 works in my case, too.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-550058641,https://api.github.com/repos/pydata/xarray/issues/3476,550058641,MDEyOklzc3VlQ29tbWVudDU1MDA1ODY0MQ==,2443309,2019-11-05T22:48:59Z,2019-11-05T22:48:59Z,MEMBER,Thanks @jsadler2 - I think this is a bug in xarray. We should be able to round trip the `site_coordinate` variable.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-550044335,https://api.github.com/repos/pydata/xarray/issues/3476,550044335,MDEyOklzc3VlQ29tbWVudDU1MDA0NDMzNQ==,6943441,2019-11-05T22:07:20Z,2019-11-05T22:07:20Z,NONE,"Sure
```
In [5]: print(ds)
Dimensions: (datetime: 20, site_code: 20)
Coordinates:
* datetime (datetime) datetime64[ns] 1970-01-01 ... 1970-01-01T19:00:00
* site_code (site_code) object '01302020' '01303000' ... '01315000'
Data variables:
streamflow (datetime, site_code) float64 dask.array
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758
https://github.com/pydata/xarray/issues/3476#issuecomment-548929993,https://api.github.com/repos/pydata/xarray/issues/3476,548929993,MDEyOklzc3VlQ29tbWVudDU0ODkyOTk5Mw==,2443309,2019-11-01T19:58:25Z,2019-11-01T19:58:34Z,MEMBER,Hi @jsadler2 - Can you show us what `sm_from_zarr` looks like (`print(sm_from_zarr)`) will do. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758