home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where user = 2489879 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • lmadaus · 1 ✖

issue 1

  • Chunked processing across multiple raster (geoTIF) files 1

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
417419321 https://github.com/pydata/xarray/issues/2314#issuecomment-417419321 https://api.github.com/repos/pydata/xarray/issues/2314 MDEyOklzc3VlQ29tbWVudDQxNzQxOTMyMQ== lmadaus 2489879 2018-08-30T18:22:31Z 2018-08-30T18:22:31Z NONE

Thanks for all the suggestions!

An update from when I originally posted this. Aligning with @shoyer, @darothen and @scottyhq 's comments, we've tested the code using cloud-optimized geoTIF files and regular geoTIFs, and it does perform better with the cloud-optimized form, though still appears to "over-eagerly" load more than just what is being worked on. With the cloud-optimized form, performance is much better when we specify the chunking strategy on the initial open_rasterio and it aligns with the chunk sizes. e.g. rasterlist = [xr.open_rasterio(x, chunks={'x': 256, 'y': 256}) for x in filelist] vs. rasterlist = [xr.open_rasterio(x, chunks={'x': None, 'y': None}) for x in filelist]

The result is a larger task graph (and much more time spent developing the task graph) but more cases where we don't run into memory problems. There still appears to be a lot more memory used than I expect, but am actively working on exploring options.

We've also noticed better performance using a k8s Dask cluster distributed across multiple "independent" workers as opposed to using a LocalCluster on a single large machine. As in, with the distributed cluster the "myfunction" (fit) operation starts happening on chunks well before the entire dataset is loaded, whereas in the LocalCluster it still tends not to begin until all chunks have been loaded in. Not exactly sure what would cause that...

I'm intrigued by @shoyer 's last suggestion of an "intermediate" chunking step. Will test that and potentially the manual iteration over the tiles. Thanks for all the suggestions and thoughts!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunked processing across multiple raster (geoTIF) files 344621749

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 878.461ms · About: xarray-datasette