home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where state = "closed" and user = 8398696 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 2

state 1

  • closed · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
733201109 MDU6SXNzdWU3MzMyMDExMDk= 4556 quick overview example not working with `to_zarr` function with gcs store skgbanga 8398696 closed 0     4 2020-10-30T13:54:43Z 2021-04-19T03:18:50Z 2021-04-19T03:18:50Z NONE      

Hello,

Consider the following code:

```py import os

import xarray as xr import numpy as np import zarr import gcsfs

from .helpers import project, credentials, bucketname # project specific

def make_store(key): if key == "memory": return zarr.MemoryStore() if key == "disc": return zarr.DirectoryStore("example.zarr") if key == "gcs": gcs = gcsfs.GCSFileSystem(project=project(), token=credentials()) root = os.path.join(bucketname, "xarray-testing") return gcsfs.GCSMap(root, gcs=gcs, check=False)

raise Exception(f"{key} not supported")

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "baz": np.pi}) ds.to_zarr(make_store("gcs"), consolidated=True, mode="w") ```

The example dataset is from the quick overview example.

The above code works fine for both MemoryStore and DirectoryStore When run with 'gcs' key, the above code generates a rather long exception, but the important detail is:

```py

/home/sandeep/.venv/valkyrie/lib/python3.8/site-packages/gcsfs/core.py(1004)_pipe_file() 1002 consistency = consistency or self.consistency 1003 bucket, key = self.split_path(path) -> 1004 size = len(data) 1005 out = None 1006 if size < 5 * 2 ** 20:

ipdb> p data array(3.14159265) ```

pi is value associated with baz key.

I also have implemented a custom zarr store (Details of which are present in this zarr issue) which gives more insight into the issue:

```py ~/.venv/valkyrie/lib/python3.8/site-packages/zarr/core.py in set_basic_selection(self, selection, value, fields)
1212 # handle zero-dimensional arrays 1213 if self._shape == (): -> 1214 return self._set_basic_selection_zd(selection, value, fields=fields)
1215 else: 1216 return self._set_basic_selection_nd(selection, value, fields=fields)

~/.venv/valkyrie/lib/python3.8/site-packages/zarr/core.py in _set_basic_selection_zd(self, selection, value, fields)
1497 # encode and store 1498 cdata = self._encode_chunk(chunk) -> 1499 self.chunk_store[ckey] = cdata 1500 1501 def _set_basic_selection_nd(self, selection, value, fields=None):

~gcsstore.py in setitem(self, key, value)
30 name = self._full_name(key) 31 blob = self.bucket.blob(name, chunk_size=human_size("1gib")) ---> 32 blob.upload_from_string(value, content_type="application/octet-stream")

~/.venv/valkyrie/lib/python3.8/site-packages/google/cloud/storage/blob.py in upload_from_string(self, data, content_type, client, predefined_acl, if_generation_match, if_generation_not_match, if_metageneration_match, if_metageneration_not_match, timeout, checksum) 2437 "md5", "crc32c" and None. The default is None. 2438 """ -> 2439 data = _to_bytes(data, encoding="utf-8") 2440 string_buffer = BytesIO(data) 2441 self.upload_from_file(

~/.venv/valkyrie/lib/python3.8/site-packages/google/cloud/_helpers.py in _to_bytes(value, encoding) 368 return result 369 else: --> 370 raise TypeError("%r could not be converted to bytes" % (value,)) 371 372

TypeError: array(3.14159265) could not be converted to bytes ```

It seems to me that zarr is not converting the data into its serialized representation (via their codec library) and is directly passing the datatype into MutableMapping which results in an exception since google libraries don't know how to convert the passed data (np.pi) into bytes.

```py ipdb> u

gcsstore.py(32)setitem() 30 name = self._full_name(key) 31 blob = self.bucket.blob(name, chunk_size=human_size("1gib")) ---> 32 blob.upload_from_string(value, content_type="application/octet-stream") 33 34 def len(self):

ipdb> p key 'baz/0' ipdb> p value array(3.14159265) ```

Please let me know if you think I should raise this issue in zarr project rather than here.

version of xarray and zarr:

xarray 0.16.1 zarr 2.5.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4556/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
851622923 MDU6SXNzdWU4NTE2MjI5MjM= 5120 DataArray to Zarr skgbanga 8398696 closed 1     2 2021-04-06T17:02:41Z 2021-04-07T19:31:59Z 2021-04-06T21:02:02Z NONE      

Currently, there is a way in which a Dataset can be stored in zarr and it is described at: http://xarray.pydata.org/en/stable/io.html?highlight=zarr#zarr

Is there a way to store DataArray into zarr directly as well?

PS: One can always create a Dataset with a single element:

xr.Dataset({"bar": foo})

where foo is a DataArray to accomplish the above.

I was wondering if this is the recommended way.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5120/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.28ms · About: xarray-datasette