html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3476#issuecomment-745393893,https://api.github.com/repos/pydata/xarray/issues/3476,745393893,MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw==,13301940,2020-12-15T16:11:39Z,2020-12-15T16:11:39Z,MEMBER,"> I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode
Indeed. And it appears that these lines are the culprits:
https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471
```python
ipdb> v
array(['a', 'b', 'c'], dtype=object)
ipdb> check
False
ipdb> vn
'x'
ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn)
ipdb> encoding
{'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]}
```
https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482
Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters:
```python
ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding)
ipdb> self.ds['x']._meta
{'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]}
```
```python
ipdb> self.ds['x']._meta['filters']
[{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]
```
As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works:
```python
ipdb> enc # without filters
{'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []}
ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **enc)
ipdb> self.ds['x']._meta['filters']
[{'id': 'vlen-utf8'}]
```
@jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information? ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758