home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "MEMBER", issue = 253136694 and user = 306380 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • mrocklin · 9 ✖

issue 1

  • WIP: Zarr backend · 9 ✖

author_association 1

  • MEMBER · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
364801395 https://github.com/pydata/xarray/pull/1528#issuecomment-364801395 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM2NDgwMTM5NQ== mrocklin 306380 2018-02-11T23:40:18Z 2018-02-11T23:40:18Z MEMBER

Does the to_zarr method suffice: http://xarray.pydata.org/en/latest/generated/xarray.Dataset.to_zarr.html#xarray.Dataset.to_zarr ?

On Sun, Feb 11, 2018 at 6:35 PM, Martin Durant notifications@github.com wrote:

Question: how would one build a zarr-xarray dataset?

With zarr you can open an array that contains no data, and use set-slice notation to fill in the values (which is what dask's store essentially does).

If I have some pre-known coordinates and bigger-than-memory data arrays, how would I go about getting the values into the zarr structure? If this can't be done directly with the xarray interface, is there a way to call zarr's open/create/zeros such that the corresponding array will appear as a variable when the same dataset is opened with xarray?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-364801073, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszIWtzhFRhlOoLnRJiQrTubrDuQ0Xks5tT3lIgaJpZM4PDrlp .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
350343117 https://github.com/pydata/xarray/pull/1528#issuecomment-350343117 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM1MDM0MzExNw== mrocklin 306380 2017-12-08T18:55:35Z 2017-12-08T18:55:35Z MEMBER

Not as far as I know.

On Fri, Dec 8, 2017 at 1:53 PM, Ryan Abernathey notifications@github.com wrote:

@rabernat commented on this pull request.

In xarray/backends/common.py https://github.com/pydata/xarray/pull/1528#discussion_r155848074:

@@ -184,7 +185,7 @@ def sync(self): import dask.array as da import dask if LooseVersion(dask.version) > LooseVersion('0.8.1'): - da.store(self.sources, self.targets, lock=GLOBAL_LOCK) + da.store(self.sources, self.targets, lock=self.lock)

There is no reason that a task run on the distributed system will not show up on the dashboard. My first guess is that somehow you're using a local scheduler.

I was not using a local scheduler. After digging further, I can see the tasks on the distributed dashboard using a regular zarr.DirectoryStore, but not when I pass a gcsfs.mapping.GCSMap to to_zarr. Is there any reasons these two should behave differently?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#discussion_r155848074, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszH6-lNha6n9cCYIa-jDFFiH2Jk4Xks5s-YWvgaJpZM4PDrlp .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
349488598 https://github.com/pydata/xarray/pull/1528#issuecomment-349488598 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0OTQ4ODU5OA== mrocklin 306380 2017-12-06T00:30:21Z 2017-12-06T00:30:21Z MEMBER

We tried this out on a cloud-deployed cluster on GCE and things worked pleasantly. Some conversation here: https://github.com/pangeo-data/pangeo/issues/19

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
347983854 https://github.com/pydata/xarray/pull/1528#issuecomment-347983854 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0Nzk4Mzg1NA== mrocklin 306380 2017-11-29T20:19:37Z 2017-11-29T20:19:37Z MEMBER

FWIW I think the best option at the moment is to make sure you add either Pickle or MsgPack filter for any zarr array with an object dtype.

Is it possible to add one of these filters to XArray's default use of Zarr?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
347981682 https://github.com/pydata/xarray/pull/1528#issuecomment-347981682 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0Nzk4MTY4Mg== mrocklin 306380 2017-11-29T20:11:25Z 2017-11-29T20:11:25Z MEMBER

FWIW my vote is for msgpack over pickle for both performance and cross-language reasons

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
345778844 https://github.com/pydata/xarray/pull/1528#issuecomment-345778844 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NTc3ODg0NA== mrocklin 306380 2017-11-20T18:05:25Z 2017-11-20T18:05:25Z MEMBER

This is, of course, by design :)

It's so nice when well-designed things come together and just work as planned :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
345575240 https://github.com/pydata/xarray/pull/1528#issuecomment-345575240 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NTU3NTI0MA== mrocklin 306380 2017-11-20T02:28:07Z 2017-11-20T02:28:07Z MEMBER

That is, indeed, quite exciting. Also exciting is that I was able to look at and compute on your data easily.

```python In [1]: import zarr

In [2]: import gcsfs

In [3]: fs = gcsfs.GCSFileSystem(project='pangeo-181919')

In [4]: gcsmap = gcsfs.mapping.GCSMap('zarr_store_test', gcs=fs, check=True, create=False)

In [5]: import xarray as xr

In [6]: ds_gcs = xr.open_zarr(gcsmap, mode='r')

In [7]: ds_gcs Out[7]: <xarray.Dataset> Dimensions: (x: 200, y: 100) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: bar (x) float64 dask.array<shape=(200,), chunksize=(40,)> foo (y, x) float32 dask.array<shape=(100, 200), chunksize=(50, 40)> Attributes: array_atr: [1, 2] some_attr: copana

In [8]: ds_gcs.sum() Out[8]: <xarray.Dataset> Dimensions: () Data variables: bar float64 dask.array<shape=(), chunksize=()> foo float32 dask.array<shape=(), chunksize=()>

In [9]: ds_gcs.sum().compute() Out[9]: <xarray.Dataset> Dimensions: () Data variables: bar float64 0.0 foo float32 20000.0 ```

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
345104713 https://github.com/pydata/xarray/pull/1528#issuecomment-345104713 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NTEwNDcxMw== mrocklin 306380 2017-11-17T00:12:01Z 2017-11-17T00:12:01Z MEMBER

Hooray for standard interfaces!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
345101150 https://github.com/pydata/xarray/pull/1528#issuecomment-345101150 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NTEwMTE1MA== mrocklin 306380 2017-11-16T23:52:48Z 2017-11-16T23:52:48Z MEMBER

The gcsfs library also provides a MutableMapping for Google Cloud Storage.

The dask.distributed library now also provides a distributed lock for synchronization, if necessary though in practice we should just rechunk the dask.array before writing.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.608ms · About: xarray-datasette