home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 516306758 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 11

  • jhamman 3
  • delgadom 2
  • borispf 1
  • Hoeze 1
  • jsadler2 1
  • dnerini 1
  • andersy005 1
  • bluetyson 1
  • saschahofmann 1
  • FlorisCalkoen 1
  • RichardScottOZ 1

author_association 3

  • NONE 6
  • CONTRIBUTOR 4
  • MEMBER 4

issue 1

  • Error when writing string coordinate variables to zarr · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1485362670 https://github.com/pydata/xarray/issues/3476#issuecomment-1485362670 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X85YiNXu saschahofmann 24508496 2023-03-27T15:40:06Z 2023-03-27T15:40:06Z CONTRIBUTOR

My impression is that keeping the zarr encoding leads to a bunch of issues (see my issue above) or the current one. There also seems to be an issue with chunking being preserved because the array encodings arent overwritten, but I cant find that issue right now. Since all of these issues are resolved by popping the zarr encoding I am wondering what are the downsides of this and whether it'd be easier to not keep that encoding at all?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
1115045538 https://github.com/pydata/xarray/issues/3476#issuecomment-1115045538 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X85Cdj6i delgadom 3698640 2022-05-02T15:38:11Z 2022-08-08T15:32:52Z CONTRIBUTOR

This has been happening a lot to me lately when writing to zarr. Thanks to @bolliger32 for the tip - this usually works like a charm:

```python for v in list(ds.coords.keys()): if ds.coords[v].dtype == object: ds.coords[v] = ds.coords[v].astype("unicode")

for v in list(ds.variables.keys()): if ds[v].dtype == object: ds[v] = ds[v].astype("unicode") `` For whatever reason, clearing the encoding and/or using.astype(str)` doesn't seem to work as reliably. I don't have a good MRE for this but hope it helps others with the same issue.

note the flag raised by @FlorisCalkoen below - don't just throw this at all your writes! there are other object types (e.g. CFTime) which you probably don't want to convert to string. This is just a patch to get around this issue for dataarrays with string coords/variables.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
1208280450 https://github.com/pydata/xarray/issues/3476#issuecomment-1208280450 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X85IBOWC delgadom 3698640 2022-08-08T15:30:52Z 2022-08-08T15:31:13Z CONTRIBUTOR

ha - yeah that's a good flag. I definitely didn't intend for that to be a universally applied patch! so probably should have included a buyer beware. but we did find that clearing the encoding doesn't always do the trick for string arrays. So a comprehensive patch will probably need to be more nuanced.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
1205346130 https://github.com/pydata/xarray/issues/3476#issuecomment-1205346130 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X85H2B9S FlorisCalkoen 44444001 2022-08-04T14:36:26Z 2022-08-04T14:36:26Z NONE

@delgadom I just noticed that your proposed solution has the side effect of also converting cftime objects (e.g., below) to unicode strings.

``` xarray.DataArray 'time' (time: 1) array([cftime.DatetimeNoLeap(2007, 7, 2, 12, 0, 0, 0, has_year_zero=True)], dtype=object)

``` I updated your lines using @Hoeze' clear function and that seems to work for now.

```[python] for v in list(ds.coords.keys()): if ds.coords[v].dtype == object: ds[v].encoding.clear()

for v in list(ds.variables.keys()):
    if ds[v].dtype == object:
        ds[v].encoding.clear()

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
1012775723 https://github.com/pydata/xarray/issues/3476#issuecomment-1012775723 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X848Xbsr RichardScottOZ 72196131 2022-01-14T04:55:03Z 2022-01-14T04:55:03Z CONTRIBUTOR

Think it worked with one variable - @Hoeze workaround was necessary for more than one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
1012775218 https://github.com/pydata/xarray/issues/3476#issuecomment-1012775218 https://api.github.com/repos/pydata/xarray/issues/3476 IC_kwDOAMm_X848Xbky bluetyson 17399794 2022-01-14T04:53:36Z 2022-01-14T04:53:36Z NONE

Thanks for that, looks like I just came across this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
841692487 https://github.com/pydata/xarray/issues/3476#issuecomment-841692487 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDg0MTY5MjQ4Nw== Hoeze 1200058 2021-05-15T16:56:00Z 2021-05-15T17:03:00Z NONE

Hi, I also keep running into this issue all the time. Right now, there is no way of round-tripping xr.open_zarr().to_zarr(), also because of https://github.com/pydata/xarray/issues/5219.

The only workaround that seems to help is the following: python to_store = xrds.copy() for var in to_store.variables: to_store[var].encoding.clear()

{
    "total_count": 3,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Error when writing string coordinate variables to zarr 516306758
746550766 https://github.com/pydata/xarray/issues/3476#issuecomment-746550766 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDc0NjU1MDc2Ng== jhamman 2443309 2020-12-16T16:09:52Z 2020-12-16T16:09:52Z MEMBER

Thanks @andersy005 for the write up and for digging into this. Doesn't this seem like it could be a bug in Zarr's create method?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
745393893 https://github.com/pydata/xarray/issues/3476#issuecomment-745393893 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDc0NTM5Mzg5Mw== andersy005 13301940 2020-12-15T16:11:39Z 2020-12-15T16:11:39Z MEMBER

I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode

Indeed. And it appears that these lines are the culprits:

https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L468-L471

python ipdb> v <xarray.Variable (x: 3)> array(['a', 'b', 'c'], dtype=object) ipdb> check False ipdb> vn 'x' ipdb> encoding = extract_zarr_variable_encoding(v, raise_on_invalid=check, name=vn) ipdb> encoding {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': [VLenUTF8()]}

https://github.com/pydata/xarray/blob/83706af66c9cb42032dbc5536b30be1da38100c0/xarray/backends/zarr.py#L480-L482

Zarr appears to be ignoring the filter information from xarray. Zarr proceeds to extracting its own filter. As a result, we end up with two filters:

python ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding) ipdb> self.ds['x']._meta {'zarr_format': 2, 'shape': (3,), 'chunks': (3,), 'dtype': dtype('O'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': None, 'order': 'C', 'filters': [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]}

python ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}, {'id': 'vlen-utf8'}]

As @borispf and @jsadler2 suggested, clearing the filters from encoding before initiating the zarr store creation works:

python ipdb> enc # without filters {'chunks': (3,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': []} ipdb> zarr_array = self.ds.create(name, shape=shape, dtype=dtype, fill_value=fill_value, **enc) ipdb> self.ds['x']._meta['filters'] [{'id': 'vlen-utf8'}]

@jhamman since you are more familiar with the internals of zarr + xarray, should we default to ignoring filter information from xarray and let zarr take care of the extraction of filter information?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
630863198 https://github.com/pydata/xarray/issues/3476#issuecomment-630863198 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDYzMDg2MzE5OA== borispf 624352 2020-05-19T14:38:26Z 2020-05-19T14:38:26Z NONE

I ran into the same issue. It seems like zarr is inserting VLenUTF8 as a filter, but the loaded data array already has that as a filter so it's trying to double encode . So another workaround is to delete the filters on site_code:

``` import xarray as xr

sm_from_zarr = xr.open_zarr('tmp/test_sm_zarr') del sm_from_zarr.site_code.encoding["filters"] sm_from_zarr.to_zarr('tmp/test_sm_zarr_from', mode='w') ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
620090096 https://github.com/pydata/xarray/issues/3476#issuecomment-620090096 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDYyMDA5MDA5Ng== dnerini 11967971 2020-04-27T16:22:35Z 2020-04-27T16:22:35Z NONE

I'm experiencing the same issue, which seems to be also related to one of my coordinates having object as datatype. Luckily, the workaround proposed by @jsadler2 works in my case, too.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
550058641 https://github.com/pydata/xarray/issues/3476#issuecomment-550058641 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDU1MDA1ODY0MQ== jhamman 2443309 2019-11-05T22:48:59Z 2019-11-05T22:48:59Z MEMBER

Thanks @jsadler2 - I think this is a bug in xarray. We should be able to round trip the site_coordinate variable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
550044335 https://github.com/pydata/xarray/issues/3476#issuecomment-550044335 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDU1MDA0NDMzNQ== jsadler2 6943441 2019-11-05T22:07:20Z 2019-11-05T22:07:20Z NONE

Sure In [5]: print(ds) <xarray.Dataset> Dimensions: (datetime: 20, site_code: 20) Coordinates: * datetime (datetime) datetime64[ns] 1970-01-01 ... 1970-01-01T19:00:00 * site_code (site_code) object '01302020' '01303000' ... '01315000' Data variables: streamflow (datetime, site_code) float64 dask.array<shape=(20, 20), chunksize=(20, 20)>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758
548929993 https://github.com/pydata/xarray/issues/3476#issuecomment-548929993 https://api.github.com/repos/pydata/xarray/issues/3476 MDEyOklzc3VlQ29tbWVudDU0ODkyOTk5Mw== jhamman 2443309 2019-11-01T19:58:25Z 2019-11-01T19:58:34Z MEMBER

Hi @jsadler2 - Can you show us what sm_from_zarr looks like (print(sm_from_zarr)) will do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when writing string coordinate variables to zarr 516306758

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.559ms · About: xarray-datasette