github: issue_comments: 5 rows where issue = 1479121713 and user = 5821660 sorted by updated

5 rows where issue = 1479121713 and user = 5821660 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1340771454	https://github.com/pydata/xarray/issues/7363#issuecomment-1340771454	https://api.github.com/repos/pydata/xarray/issues/7363	IC_kwDOAMm_X85P6ox-	kmuehlbauer 5821660	2022-12-07T10:50:28Z	2022-12-07T10:50:28Z	MEMBER	Does this more or less represent your Dataset? ```python import numpy as np import xarray as xr import datetime create two timeseries', second is for reindex itime = np.arange(0, 3208464).astype("<M8[s]") itime2 = np.arange(0, 4000000).astype("<M8[s]") create two dataset with the time only ds1 = xr.Dataset({"time": itime}) ds2 = xr.Dataset({"time": itime2}) add random data to ds1 ds1 = ds1.expand_dims("station") ds1 = ds1.assign({"test": (["station", "time"], np.random.rand(106, 3208464))}) ``` Now we reindex with the longer timeseries, it only takes a couple of seconds on my machine: `python %%time ds3 = ds1.reindex(time=ds2.time)` `CPU times: user 3.16 s, sys: 649 ms, total: 3.81 s Wall time: 3.81 s` Data is unchanged after reindex: `python xr.testing.assert_equal(ds1.test, ds3.test.isel(time=slice(0, 3208464)))`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340712532	https://github.com/pydata/xarray/issues/7363#issuecomment-1340712532	https://api.github.com/repos/pydata/xarray/issues/7363	IC_kwDOAMm_X85P6aZU	kmuehlbauer 5821660	2022-12-07T10:20:40Z	2022-12-07T10:20:40Z	MEMBER	@jerabaul29 Concerning possible slowness of reindex, I think it uses some sorting inside. But isn't a timeseries sorted anyway? Nevertheless you have a point here, that reindex might not be the right tool for this use-case. It would be nice of we could create your dataset in memory with some random data and check the different proposed solutions for their performance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340482904	https://github.com/pydata/xarray/issues/7363#issuecomment-1340482904	https://api.github.com/repos/pydata/xarray/issues/7363	IC_kwDOAMm_X85P5iVY	kmuehlbauer 5821660	2022-12-07T06:59:11Z	2022-12-07T06:59:11Z	MEMBER	@jerabaul29 Does your Dataset with the 3 Million time points fit into your machine's memory? Are the arrays dask-backed? It is unfortunately not seen in the screenshots. Calculating from the sizes this is 106 x 3_208_464 single measurements -> 340_097_184. Going from float (8 byte) this will lead to 2_720_777_472, roughly 2.7GB which should fit in most setups. I'm not really sure but good chance that reindex is creating a completely new Dataset, which means the computer has to hold the origin as well as the new Dataset (which is roughly 3.2GB). This adds up to almost 6GB RAM. Depending on your machine and other tasks this might drive into RAM issues. But xarray devs will know better. @keewis suggestion of creating and concatenating a new array with predefined values which is file-backed could resolve the issues you are currently facing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339450779	https://github.com/pydata/xarray/issues/7363#issuecomment-1339450779	https://api.github.com/repos/pydata/xarray/issues/7363	IC_kwDOAMm_X85P1mWb	kmuehlbauer 5821660	2022-12-06T14:13:30Z	2022-12-06T14:13:30Z	MEMBER	You could take the exact time you have and just add the addition times. You even might create those additional ones by giving a timeinterval and the number . I'd need to look up , but I'm currently only on phone.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339403307	https://github.com/pydata/xarray/issues/7363#issuecomment-1339403307	https://api.github.com/repos/pydata/xarray/issues/7363	IC_kwDOAMm_X85P1awr	kmuehlbauer 5821660	2022-12-06T13:39:06Z	2022-12-06T13:39:33Z	MEMBER	Would xarray.Dataset.reindex do what you want? You would need to extend you time array/coordinate appropriately and feed it to reindex. Maybe you also need to provide fillvalue keywords to get your need portions filled with the correct fillvalue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where issue = 1479121713 and user = 5821660 sorted by updated_at descending

create two timeseries', second is for reindex

create two dataset with the time only

add random data to ds1

Advanced export