home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "MEMBER" and issue = 280626621 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • mrocklin 6
  • rabernat 1
  • jhamman 1

issue 1

  • slow performance when storing datasets in gcsfs-backed zarr stores · 8 ✖

author_association 1

  • MEMBER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
453799889 https://github.com/pydata/xarray/issues/1770#issuecomment-453799889 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDQ1Mzc5OTg4OQ== jhamman 2443309 2019-01-13T03:52:46Z 2019-01-13T03:52:46Z MEMBER

Closing. I think our fixes in xarray and zarr last winter addressed most of the problems here. If others feel differently, please reopen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351106809 https://github.com/pydata/xarray/issues/1770#issuecomment-351106809 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNjgwOQ== mrocklin 306380 2017-12-12T16:33:00Z 2017-12-12T16:33:00Z MEMBER

https://github.com/dask/gcsfs/pull/49

```python import gcsfs fs = gcsfs.GCSFileSystem(project='pangeo-181919') gcsmap = gcsfs.mapping.GCSMap('pangeo-data/test997', gcs=fs, check=True, create=True)

import dask.array as dsa shape = (30, 50, 1080, 2160) chunkshape = (1, 1, 1080, 2160) ar = dsa.random.random(shape, chunks=chunkshape)

import zarr za = zarr.create(ar.shape, chunks=chunkshape, dtype=ar.dtype, store=gcsmap)

In [2]: import cloudpickle In [3]: %time len(cloudpickle.dumps(gcsmap)) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 560 µs Out[3]: 213 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351104996 https://github.com/pydata/xarray/issues/1770#issuecomment-351104996 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNDk5Ng== mrocklin 306380 2017-12-12T16:27:27Z 2017-12-12T16:27:27Z MEMBER

t

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351104967 https://github.com/pydata/xarray/issues/1770#issuecomment-351104967 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwNDk2Nw== mrocklin 306380 2017-12-12T16:27:23Z 2017-12-12T16:27:23Z MEMBER

It looks like serializing GCSFileSystem.dirs can be quite expensive. I think that this is just here for caching and efficiency, is this correct?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351101352 https://github.com/pydata/xarray/issues/1770#issuecomment-351101352 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTEwMTM1Mg== mrocklin 306380 2017-12-12T16:17:11Z 2017-12-12T16:17:11Z MEMBER

Ah, we can just serialize the .gcs object and leave the rest to the GCSFileSystem.

Perhaps the MutableMapping collections class defines get/setstate differently. I'll play aorund.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
351098929 https://github.com/pydata/xarray/issues/1770#issuecomment-351098929 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MTA5ODkyOQ== mrocklin 306380 2017-12-12T16:09:37Z 2017-12-12T16:09:37Z MEMBER

When pickling the GCS mapping it looks like we're actually pulling down all of the data within it (Zarr has already placed some metadata) instead of serializing the connection information.

@martindurant what information should we safely be passing around when serializing? These tasks would need to remain valid for longer than the standard hour-long short-lived-token.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
350387325 https://github.com/pydata/xarray/issues/1770#issuecomment-350387325 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MDM4NzMyNQ== mrocklin 306380 2017-12-08T22:24:35Z 2017-12-09T21:32:00Z MEMBER

There threading locks in your profile is likely due to using the dask threaded scheduler. I recommend using the single threaded scheduler when profiling. Dask.get

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
350381719 https://github.com/pydata/xarray/issues/1770#issuecomment-350381719 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM1MDM4MTcxOQ== rabernat 1197350 2017-12-08T21:54:01Z 2017-12-08T21:54:01Z MEMBER

does dask.array.store(..., lock=None) do the same thing as dask.array.store(..., lock=False)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.372ms · About: xarray-datasette