github: issue_comments: 1 row where issue = 516306758 and user = 13301940 sorted by updated

1 row where issue = 516306758 and user = 13301940 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
745393893	https://github.com/pydata/xarray/issues/3476#issuecomment-745393893	https://api.github.com/repos/pydata/xarray/issues/3476	MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw==	andersy005 13301940	2020-12-15T16:11:39Z	2020-12-15T16:11:39Z	MEMBER	I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode Indeed. And it appears that these lines are the culprits: https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471 `python ipdb> v <xarray.Variable (x: 3)> array(['a', 'b', 'c'], dtype=object) ipdb> check False ipdb> vn 'x' ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn) ipdb> encoding {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]}` https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482 Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters: `python ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, encoding) ipdb> self.ds['x']._meta {'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]}` `python ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]` As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works: `python ipdb> enc # without filters {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []} ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, enc) ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}]` @jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Error when writing string coordinate variables to zarr 516306758

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);