issues
1 row where type = "issue" and user = 12278765 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
345715825 | MDU6SXNzdWUzNDU3MTU4MjU= | 2329 | Out-of-core processing with dask not working properly? | lrntct 12278765 | closed | 0 | 16 | 2018-07-30T11:19:41Z | 2019-01-13T01:57:12Z | 2019-01-13T01:57:12Z | NONE | Hi, I have a bunch of GRIB files that amount to ~250GB. I want to concatenate them and save to zarr. I concatenated them with CDO and saved to netcdf. Now I have a ~500GB netcdf that I want to convert to zarr. I want to convert to zarr because: - I plan to run the analysis on a cluster, and I understand that zarr is better for that - By using float16 and lz4 compression, I believe I can reduce the size to ~100GB and have faster access (I think the analysis will be i/o bound). The netcdf:
The code:
Problem descriptionI left my code to run over the weekend. After 63h of processing, the zarr store was only 1GB in size. The system monitor indicated that the Python process had 17TB worth of disk read. At that rate, it would have taken months to finish. Is there something that I can do to increase the processing speed? I run Ubuntu 18.04 on a Core i7-6700 with 16GB of RAM. The disk is an HDD with a speed of ~100MB/s. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2329/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);