html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3476#issuecomment-745393893,https://api.github.com/repos/pydata/xarray/issues/3476,745393893,MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw==,13301940,2020-12-15T16:11:39Z,2020-12-15T16:11:39Z,MEMBER,"> I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode Indeed. And it appears that these lines are the culprits: https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471 ```python ipdb> v array(['a', 'b', 'c'], dtype=object) ipdb> check False ipdb> vn 'x' ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn) ipdb> encoding {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]} ``` https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482 Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters: ```python ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding) ipdb> self.ds['x']._meta {'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]} ``` ```python ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}] ``` As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works: ```python ipdb> enc # without filters {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []} ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **enc) ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}] ``` @jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information? ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,516306758