issue_comments
7 rows where issue = 864249974 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Make creating a MultiIndex in stack optional · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
935683056 | https://github.com/pydata/xarray/issues/5202#issuecomment-935683056 | https://api.github.com/repos/pydata/xarray/issues/5202 | IC_kwDOAMm_X843xWPw | benbovy 4160723 | 2021-10-06T07:50:59Z | 2021-10-06T07:50:59Z | MEMBER | From https://github.com/pydata/xarray/pull/5692#issuecomment-925718593:
cc @pydata/xarray as this is an improvement regarding this issue but also a sensible change. To ensure a smoother transition we could maybe add a
We can default to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
856296662 | https://github.com/pydata/xarray/issues/5202#issuecomment-856296662 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDg1NjI5NjY2Mg== | benbovy 4160723 | 2021-06-07T22:10:15Z | 2021-06-07T22:10:15Z | MEMBER |
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
855822204 | https://github.com/pydata/xarray/issues/5202#issuecomment-855822204 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDg1NTgyMjIwNA== | martinitus 7611856 | 2021-06-07T10:49:49Z | 2021-06-07T10:49:49Z | NONE | Besides the CPU requirements, IMHO, the memory consumption is even worse. Imagine you want to hold a 1000x1000x1000 int64 array. That would be ~ 7.5 GB and still fits into RAM on most machines. Let's assume float coordinates for all three axes. Their memory consumption of 3000*8 bytes is negligible. Now if you stack that, you end up with three additional 7.5GB arrays. With higher dimensions the situation gets even worse. That said, while it generally should be possible to create the coordinates of the stacked array on the fly, I don't have a solution for it. Side note: I stumbled over that when combining xarray with pytorch, where I want to evaluate a model on a large cartesian grid. For that I stacked the array and batched the stacked coordinates to feed them to pytorch, which makes the iteration over the cartesian space really nice and smooth in code. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
825494167 | https://github.com/pydata/xarray/issues/5202#issuecomment-825494167 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDgyNTQ5NDE2Nw== | max-sixty 5635139 | 2021-04-23T08:30:55Z | 2021-04-23T08:30:55Z | MEMBER | Great, this seems like a good idea — at the very least an |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
825488224 | https://github.com/pydata/xarray/issues/5202#issuecomment-825488224 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDgyNTQ4ODIyNA== | Hoeze 1200058 | 2021-04-23T08:21:44Z | 2021-04-23T08:22:34Z | NONE | It's a large problem when working with Dask/Zarr: - First, it loads all indices into memory - Then, it computes in a single thread the MultiIndex I had cases where stacking the dimensions took ~15 minutes while computing+saving the dataset was done in < 1min. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
824459878 | https://github.com/pydata/xarray/issues/5202#issuecomment-824459878 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDgyNDQ1OTg3OA== | shoyer 1217238 | 2021-04-22T00:57:56Z | 2021-04-22T00:57:56Z | MEMBER |
It depends, but it can easily be 50% to nearly 100% of the runtime. If we use Fortran order arrays, we can get a rough lower bound on the time for MultiIndex creation, e.g., consider:
Pandas does delay creating the actual hash-table behind a MultiIndex until it's needed, so I guess the main expense here is just allocating the new coordinate arrays. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 | |
824388578 | https://github.com/pydata/xarray/issues/5202#issuecomment-824388578 | https://api.github.com/repos/pydata/xarray/issues/5202 | MDEyOklzc3VlQ29tbWVudDgyNDM4ODU3OA== | max-sixty 5635139 | 2021-04-21T22:05:53Z | 2021-04-21T22:05:53Z | MEMBER | Do we have any ideas on how expensive the MultiIndex creation is as a share of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Make creating a MultiIndex in stack optional 864249974 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5