home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 414641120

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
414641120 MDU6SXNzdWU0MTQ2NDExMjA= 2789 Appending to zarr with string dtype 4711805 open 0     2 2019-02-26T14:31:42Z 2022-04-09T02:18:05Z   CONTRIBUTOR      

```python import xarray as xr

da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified

ds = xr.open_zarr('ds') print(ds.da.values) ```

The following code prints ['foo'] (string type). The encoding chosen by zarr is "dtype": "|S3", which corresponds to bytes, but it seems to be decoded to a string, which is what we want.

$ cat ds/da/.zarray { "chunks": [ 1 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "|S3", "fill_value": null, "filters": null, "order": "C", "shape": [ 1 ], "zarr_format": 2 }

The problem is that if I want to append to the zarr archive, like so:

```python import zarr

ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new)

ds = xr.open_zarr('ds') print(ds.da.values) ```

It prints ['foo' 'bar']. Indeed the encoding was kept as "dtype": "|S3", which is fine for a string of 3 characters but not for 6.

If I want to specify the encoding with the maximum length, e.g:

python ds.to_zarr('ds', encoding={'da': {'dtype': '|S6'}})

It solves the length problem, but now my strings are kept as bytes: [b'foo' b'barbar']. If I specify a Unicode encoding:

python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})

It is not taken into account. The zarr encoding is "dtype": "|S3" and I am back to my length problem: ['foo' 'bar'].

The solution with 'dtype': '|S6' is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 72.939ms · About: xarray-datasette