home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "MEMBER" and issue = 1479121713 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • kmuehlbauer 5
  • keewis 3

issue 1

  • expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) · 8 ✖

author_association 1

  • MEMBER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1340809916 https://github.com/pydata/xarray/issues/7363#issuecomment-1340809916 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P6yK8 keewis 14808389 2022-12-07T11:09:03Z 2022-12-07T11:09:03Z MEMBER

implementing a "grow_coordinate" function to grow / reallocate larger arrays copying the previous chunk along a coordinate

this sounds a lot like pad with mode="constant"?

is it possible that xarray makes no assumptions of this kind

xarray uses pandas indexes for alignment and indexing (if you have a recent version of xarray you should see the "Indexes" section in the HTML repr), so yes, it will always make sure to use a search that is more efficient than the linear search, as long as the data is sorted. This was also the reason why you had to use swap_dims / set_index to create an index along the coordinate you wanted to reindex.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340771454 https://github.com/pydata/xarray/issues/7363#issuecomment-1340771454 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P6ox- kmuehlbauer 5821660 2022-12-07T10:50:28Z 2022-12-07T10:50:28Z MEMBER

Does this more or less represent your Dataset?

```python import numpy as np import xarray as xr import datetime

create two timeseries', second is for reindex

itime = np.arange(0, 3208464).astype("<M8[s]") itime2 = np.arange(0, 4000000).astype("<M8[s]")

create two dataset with the time only

ds1 = xr.Dataset({"time": itime}) ds2 = xr.Dataset({"time": itime2})

add random data to ds1

ds1 = ds1.expand_dims("station") ds1 = ds1.assign({"test": (["station", "time"], np.random.rand(106, 3208464))}) ```

Now we reindex with the longer timeseries, it only takes a couple of seconds on my machine:

python %%time ds3 = ds1.reindex(time=ds2.time) CPU times: user 3.16 s, sys: 649 ms, total: 3.81 s Wall time: 3.81 s

Data is unchanged after reindex:

python xr.testing.assert_equal(ds1.test, ds3.test.isel(time=slice(0, 3208464)))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340712532 https://github.com/pydata/xarray/issues/7363#issuecomment-1340712532 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P6aZU kmuehlbauer 5821660 2022-12-07T10:20:40Z 2022-12-07T10:20:40Z MEMBER

@jerabaul29 Concerning possible slowness of reindex, I think it uses some sorting inside. But isn't a timeseries sorted anyway? Nevertheless you have a point here, that reindex might not be the right tool for this use-case. It would be nice of we could create your dataset in memory with some random data and check the different proposed solutions for their performance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340482904 https://github.com/pydata/xarray/issues/7363#issuecomment-1340482904 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P5iVY kmuehlbauer 5821660 2022-12-07T06:59:11Z 2022-12-07T06:59:11Z MEMBER

@jerabaul29 Does your Dataset with the 3 Million time points fit into your machine's memory? Are the arrays dask-backed? It is unfortunately not seen in the screenshots. Calculating from the sizes this is 106 x 3_208_464 single measurements -> 340_097_184. Going from float (8 byte) this will lead to 2_720_777_472, roughly 2.7GB which should fit in most setups. I'm not really sure but good chance that reindex is creating a completely new Dataset, which means the computer has to hold the origin as well as the new Dataset (which is roughly 3.2GB). This adds up to almost 6GB RAM. Depending on your machine and other tasks this might drive into RAM issues. But xarray devs will know better.

@keewis suggestion of creating and concatenating a new array with predefined values which is file-backed could resolve the issues you are currently facing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339675640 https://github.com/pydata/xarray/issues/7363#issuecomment-1339675640 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P2dP4 keewis 14808389 2022-12-06T16:55:40Z 2022-12-06T16:55:40Z MEMBER

I'm a bit surprised. Could you post a repr of timestamps_extended_basis? That might help figuring out what exactly happened.

If everything fails, you might also create a new xarray object with just the new values, and then use xr.concat to combine both?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339568566 https://github.com/pydata/xarray/issues/7363#issuecomment-1339568566 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P2DG2 keewis 14808389 2022-12-06T15:39:20Z 2022-12-06T15:39:20Z MEMBER

I think this is because you don't have an index along the dimension. Try any of python previous_observations.set_coords(["timestamps"]).swap_dims({"time": "timestamps"}).reindex(...) previous_observations.set_index({"time": "timestamps"}).reindex(...) (the only difference is the name of the dimension / coordinate you end up with)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339450779 https://github.com/pydata/xarray/issues/7363#issuecomment-1339450779 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P1mWb kmuehlbauer 5821660 2022-12-06T14:13:30Z 2022-12-06T14:13:30Z MEMBER

You could take the exact time you have and just add the addition times. You even might create those additional ones by giving a timeinterval and the number . I'd need to look up , but I'm currently only on phone.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339403307 https://github.com/pydata/xarray/issues/7363#issuecomment-1339403307 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P1awr kmuehlbauer 5821660 2022-12-06T13:39:06Z 2022-12-06T13:39:33Z MEMBER

Would xarray.Dataset.reindex do what you want?

You would need to extend you time array/coordinate appropriately and feed it to reindex. Maybe you also need to provide fillvalue keywords to get your need portions filled with the correct fillvalue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.478ms · About: xarray-datasette