home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 516306758 and user = 13301940 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • andersy005 · 1 ✖

issue 1

  • Error when writing string coordinate variables to zarr · 1 ✖

author_association 1

  • MEMBER 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
745393893 https://github.com/pydata/xarray/issues/3476#issuecomment-745393893 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw== andersy005 13301940 2020-12-15T16:11:39Z 2020-12-15T16:11:39Z MEMBER

I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode

Indeed. And it appears that these lines are the culprits:

https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471

python ipdb> v <xarray.Variable (x: 3)> array(['a', 'b', 'c'], dtype=object) ipdb> check False ipdb> vn 'x' ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn) ipdb> encoding {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]}

https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482

Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters:

python ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding) ipdb> self.ds['x']._meta {'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]}

python ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]

As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works:

python ipdb> enc # without filters {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []} ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **enc) ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}]

@jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 84.559ms · About: xarray-datasette