home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 280626621 and user = 306380 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mrocklin · 6 ✖

issue 1

  • slow performance when storing datasets in gcsfs-backed zarr stores · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
351106809 https://github.com/pydata/xarray/issues/1770#issuecomment-351106809 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNjgwOQ== mrocklin 306380 2017-12-12T16:33:00Z 2017-12-12T16:33:00Z MEMBER

https://github.com/dask/gcsfs/pull/49

```python import gcsfs fs = gcsfs.GCSFileSystem(project='pangeo-181919') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/test997', gcs=fs, check=True, create=True)

import dask.array as dsa shape = (30, 50, 1080, 2160) chunkshape = (1, 1, 1080, 2160) ar = dsa.random.random(shape, chunks=chunkshape)

import zarr za = zarr.create(ar.shape, chunks=chunkshape, dtype=ar.dtype, store=gcsmap)

In [2]: import cloudpickle In [3]: %time len(cloudpickle.dumps(gcsmap)) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 560 µs Out[3]: 213 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351104996 https://github.com/pydata/xarray/issues/1770#issuecomment-351104996 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNDk5Ng== mrocklin 306380 2017-12-12T16:27:27Z 2017-12-12T16:27:27Z MEMBER

t

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351104967 https://github.com/pydata/xarray/issues/1770#issuecomment-351104967 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNDk2Nw== mrocklin 306380 2017-12-12T16:27:23Z 2017-12-12T16:27:23Z MEMBER

It looks like serializing GCSFileSystem.dirs can be quite expensive. I think that this is just here for caching and efficiency, is this correct?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351101352 https://github.com/pydata/xarray/issues/1770#issuecomment-351101352 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwMTM1Mg== mrocklin 306380 2017-12-12T16:17:11Z 2017-12-12T16:17:11Z MEMBER

Ah, we can just serialize the .gcs object and leave the rest to the GCSFileSystem.

Perhaps the MutableMapping collections class defines get/setstate differently. I'll play aorund.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351098929 https://github.com/pydata/xarray/issues/1770#issuecomment-351098929 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTA5ODkyOQ== mrocklin 306380 2017-12-12T16:09:37Z 2017-12-12T16:09:37Z MEMBER

When pickling the GCS mapping it looks like we're actually pulling down all of the data within it (Zarr has already placed some metadata) instead of serializing the connection information.

@martindurant what information should we safely be passing around when serializing? These tasks would need to remain valid for longer than the standard hour-long short-lived-token.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
350387325 https://github.com/pydata/xarray/issues/1770#issuecomment-350387325 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MDM4NzMyNQ== mrocklin 306380 2017-12-08T22:24:35Z 2017-12-09T21:32:00Z MEMBER

There threading locks in your profile is likely due to using the dask threaded scheduler. I recommend using the single threaded scheduler when profiling. Dask.get

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 30.663ms · About: xarray-datasette