issue_comments
2 rows where user = 5509356 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
980840643 | https://github.com/pydata/xarray/issues/6033#issuecomment-980840643 | https://api.github.com/repos/pydata/xarray/issues/6033 | IC_kwDOAMm_X846dnDD | adair-kovac 5509356 | 2021-11-28T04:57:48Z | 2021-11-28T04:57:48Z | NONE | @max-sixty Okay, yeah, that's the problem, it's re-downloading the data every time the values are accessed. Apparently this is the default behavior given that zarr is a chunked format. Adding My data archive can't normally be usefully read without open_mfdataset and it's small enough to easily fit in memory so this behavior isn't ideal. I guess I had assumed that the data would get stored on disk temporarily even if it wasn't in memory, too, so it's an unexpected limitation that the choices are to either cache it in memory or re-read from S3 every time you access the data. It also seems odd that the default caching logic just takes into account whether the data is chunked, not how big (small) it is, how slow accessing the store is, or whether the data's being repeatedly accessed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Threadlocking in DataArray calculations for zarr data depending on where it's loaded from (S3 vs local) 1064837571 | |
980477705 | https://github.com/pydata/xarray/issues/6033#issuecomment-980477705 | https://api.github.com/repos/pydata/xarray/issues/6033 | IC_kwDOAMm_X846cOcJ | adair-kovac 5509356 | 2021-11-27T00:49:25Z | 2021-11-27T00:50:28Z | NONE | @max-sixty There shouldn't be any download happening by the time I'm seeing this issue. If you check the notebook (also here if it's easier to read), ~~I check that the data is downloaded (via looking at the dataset nbytes) before attempting the computation and verify it hasn't changed afterward.~~ wait nevermind that doesn't actually work, I just verified that nbytes returns the same size of the data even when I've just opened the dataset. Is there a way to check what is and isn't downloaded? But in any case, I call .values on the data beforehand and it has the same issue if I run the method a second (third, fourth, fifth) time. Unless it's repeatedly re-downloading the same data for some reason download doesn't seem to be the problem. The dataset is about 350 MB and has 48 x 150 x 150 chunks. I haven't tried creating smaller or larger datasets and posting them to S3 to see if it happens with them, too. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Threadlocking in DataArray calculations for zarr data depending on where it's loaded from (S3 vs local) 1064837571 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1