home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 733201109

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
733201109 MDU6SXNzdWU3MzMyMDExMDk= 4556 quick overview example not working with `to_zarr` function with gcs store 8398696 closed 0     4 2020-10-30T13:54:43Z 2021-04-19T03:18:50Z 2021-04-19T03:18:50Z NONE      

Hello,

Consider the following code:

```py import os

import xarray as xr import numpy as np import zarr import gcsfs

from .helpers import project, credentials, bucketname # project specific

def make_store(key): if key == "memory": return zarr.MemoryStore() if key == "disc": return zarr.DirectoryStore("example.zarr") if key == "gcs": gcs = gcsfs.GCSFileSystem(project=project(), token=credentials()) root = os.path.join(bucketname, "xarray-testing") return gcsfs.GCSMap(root, gcs=gcs, check=False)

raise Exception(f"{key} not supported")

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "baz": np.pi}) ds.to_zarr(make_store("gcs"), consolidated=True, mode="w") ```

The example dataset is from the quick overview example.

The above code works fine for both MemoryStore and DirectoryStore When run with 'gcs' key, the above code generates a rather long exception, but the important detail is:

```py

/home/sandeep/.venv/valkyrie/lib/python3.8/site-packages/gcsfs/core.py(1004)_pipe_file() 1002 consistency = consistency or self.consistency 1003 bucket, key = self.split_path(path) -> 1004 size = len(data) 1005 out = None 1006 if size < 5 * 2 ** 20:

ipdb> p data array(3.14159265) ```

pi is value associated with baz key.

I also have implemented a custom zarr store (Details of which are present in this zarr issue) which gives more insight into the issue:

```py ~/.venv/valkyrie/lib/python3.8/site-packages/zarr/core.py in set_basic_selection(self, selection, value, fields)
1212 # handle zero-dimensional arrays 1213 if self._shape == (): -> 1214 return self._set_basic_selection_zd(selection, value, fields=fields)
1215 else: 1216 return self._set_basic_selection_nd(selection, value, fields=fields)

~/.venv/valkyrie/lib/python3.8/site-packages/zarr/core.py in _set_basic_selection_zd(self, selection, value, fields)
1497 # encode and store 1498 cdata = self._encode_chunk(chunk) -> 1499 self.chunk_store[ckey] = cdata 1500 1501 def _set_basic_selection_nd(self, selection, value, fields=None):

~gcsstore.py in setitem(self, key, value)
30 name = self._full_name(key) 31 blob = self.bucket.blob(name, chunk_size=human_size("1gib")) ---> 32 blob.upload_from_string(value, content_type="application/octet-stream")

~/.venv/valkyrie/lib/python3.8/site-packages/google/cloud/storage/blob.py in upload_from_string(self, data, content_type, client, predefined_acl, if_generation_match, if_generation_not_match, if_metageneration_match, if_metageneration_not_match, timeout, checksum) 2437 "md5", "crc32c" and None. The default is None. 2438 """ -> 2439 data = _to_bytes(data, encoding="utf-8") 2440 string_buffer = BytesIO(data) 2441 self.upload_from_file(

~/.venv/valkyrie/lib/python3.8/site-packages/google/cloud/_helpers.py in _to_bytes(value, encoding) 368 return result 369 else: --> 370 raise TypeError("%r could not be converted to bytes" % (value,)) 371 372

TypeError: array(3.14159265) could not be converted to bytes ```

It seems to me that zarr is not converting the data into its serialized representation (via their codec library) and is directly passing the datatype into MutableMapping which results in an exception since google libraries don't know how to convert the passed data (np.pi) into bytes.

```py ipdb> u

gcsstore.py(32)setitem() 30 name = self._full_name(key) 31 blob = self.bucket.blob(name, chunk_size=human_size("1gib")) ---> 32 blob.upload_from_string(value, content_type="application/octet-stream") 33 34 def len(self):

ipdb> p key 'baz/0' ipdb> p value array(3.14159265) ```

Please let me know if you think I should raise this issue in zarr project rather than here.

version of xarray and zarr:

xarray 0.16.1 zarr 2.5.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4556/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 0.708ms · About: xarray-datasette