github: issue_comments: 7 rows where issue = 864249974 sorted by updated

7 rows where issue = 864249974 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
935683056	https://github.com/pydata/xarray/issues/5202#issuecomment-935683056	https://api.github.com/repos/pydata/xarray/issues/5202	IC_kwDOAMm_X843xWPw	benbovy 4160723	2021-10-06T07:50:59Z	2021-10-06T07:50:59Z	MEMBER	From https://github.com/pydata/xarray/pull/5692#issuecomment-925718593: One change is that a multi-index is not always created with stack. It is created only if each of the dimensions to stack together have one and only one coordinate with a pandas index (this could be a non-dimension coordinate). This could maybe address #5202, since we could simply drop the indexes before stacking the dimensions in order to avoid the creation of a multi-index. I don't think it's a big breaking change either unless there are users who rely on default multi-indexes with range (0, 1, 2...) levels. Looking at #5202, however, those default multi-indexes seem more problematic than something really useful, but I might be wrong here. Also, range-based indexes can still be created explicitly before stacking the dimensions if needed. Another consequence is that stack is not always reversible, since unstack still requires a pandas multi-index (one and only one multi-index per dimension to unstack). cc @pydata/xarray as this is an improvement regarding this issue but also a sensible change. To ensure a smoother transition we could maybe add a `create_index` option to `stack` which accepts these values: `True`: always create a multi-index `False`: never create a multi-index `None`: create a multi-index only if we can unambiguously pick one index for each of the dimensions to stack We can default to `True` now to avoid breaking changes and maybe later default to `None`. If we eventually add support for custom (non-pandas backed) indexes, we could also allow passing an `xarray.Index` class.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
856296662	https://github.com/pydata/xarray/issues/5202#issuecomment-856296662	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDg1NjI5NjY2Mg==	benbovy 4160723	2021-06-07T22:10:15Z	2021-06-07T22:10:15Z	MEMBER	it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)` `Dataset.stack` might eventually accept any custom index (that supports it) if that makes sense. Would `index=None` be slightly better than `index=False` in that case? (considering that the default value would be `index=PandasMultiIndex` or something like that).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
855822204	https://github.com/pydata/xarray/issues/5202#issuecomment-855822204	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDg1NTgyMjIwNA==	martinitus 7611856	2021-06-07T10:49:49Z	2021-06-07T10:49:49Z	NONE	Besides the CPU requirements, IMHO, the memory consumption is even worse. Imagine you want to hold a 1000x1000x1000 int64 array. That would be ~ 7.5 GB and still fits into RAM on most machines. Let's assume float coordinates for all three axes. Their memory consumption of 3000*8 bytes is negligible. Now if you stack that, you end up with three additional 7.5GB arrays. With higher dimensions the situation gets even worse. That said, while it generally should be possible to create the coordinates of the stacked array on the fly, I don't have a solution for it. Side note: I stumbled over that when combining xarray with pytorch, where I want to evaluate a model on a large cartesian grid. For that I stacked the array and batched the stacked coordinates to feed them to pytorch, which makes the iteration over the cartesian space really nice and smooth in code.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
825494167	https://github.com/pydata/xarray/issues/5202#issuecomment-825494167	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDgyNTQ5NDE2Nw==	max-sixty 5635139	2021-04-23T08:30:55Z	2021-04-23T08:30:55Z	MEMBER	Great, this seems like a good idea — at the very least an `index=False` option	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
825488224	https://github.com/pydata/xarray/issues/5202#issuecomment-825488224	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDgyNTQ4ODIyNA==	Hoeze 1200058	2021-04-23T08:21:44Z	2021-04-23T08:22:34Z	NONE	It's a large problem when working with Dask/Zarr: - First, it loads all indices into memory - Then, it computes in a single thread the MultiIndex I had cases where stacking the dimensions took ~15 minutes while computing+saving the dataset was done in < 1min.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
824459878	https://github.com/pydata/xarray/issues/5202#issuecomment-824459878	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDgyNDQ1OTg3OA==	shoyer 1217238	2021-04-22T00:57:56Z	2021-04-22T00:57:56Z	MEMBER	Do we have any ideas on how expensive the MultiIndex creation is as a share of `stack`? It depends, but it can easily be 50% to nearly 100% of the runtime. `stack()` uses `reshape()` on data variables, which is either free (for arrays that are still contiguous and can use views) or can be delayed until compute-time (with dask). In contrast, the MultiIndex is always created eagerly. If we use Fortran order arrays, we can get a rough lower bound on the time for MultiIndex creation, e.g., consider: `python import xarray import numpy as np a = xarray.DataArray(np.ones((5000, 5000), order='F'), dims=['x', 'y']) %prun a.stack(z=['x', 'y'])` Not surprisingly, making the multi-index takes about half the runtime here. Pandas does delay creating the actual hash-table behind a MultiIndex until it's needed, so I guess the main expense here is just allocating the new coordinate arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974
824388578	https://github.com/pydata/xarray/issues/5202#issuecomment-824388578	https://api.github.com/repos/pydata/xarray/issues/5202	MDEyOklzc3VlQ29tbWVudDgyNDM4ODU3OA==	max-sixty 5635139	2021-04-21T22:05:53Z	2021-04-21T22:05:53Z	MEMBER	Do we have any ideas on how expensive the MultiIndex creation is as a share of `stack`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Make creating a MultiIndex in stack optional 864249974

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);