home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 1479121713 and user = 5821660 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • kmuehlbauer · 5 ✖

issue 1

  • expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1340771454 https://github.com/pydata/xarray/issues/7363#issuecomment-1340771454 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P6ox- kmuehlbauer 5821660 2022-12-07T10:50:28Z 2022-12-07T10:50:28Z MEMBER

Does this more or less represent your Dataset?

```python import numpy as np import xarray as xr import datetime

create two timeseries', second is for reindex

itime = np.arange(0, 3208464).astype("<M8[s]") itime2 = np.arange(0, 4000000).astype("<M8[s]")

create two dataset with the time only

ds1 = xr.Dataset({"time": itime}) ds2 = xr.Dataset({"time": itime2})

add random data to ds1

ds1 = ds1.expand_dims("station") ds1 = ds1.assign({"test": (["station", "time"], np.random.rand(106, 3208464))}) ```

Now we reindex with the longer timeseries, it only takes a couple of seconds on my machine:

python %%time ds3 = ds1.reindex(time=ds2.time) CPU times: user 3.16 s, sys: 649 ms, total: 3.81 s Wall time: 3.81 s

Data is unchanged after reindex:

python xr.testing.assert_equal(ds1.test, ds3.test.isel(time=slice(0, 3208464)))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340712532 https://github.com/pydata/xarray/issues/7363#issuecomment-1340712532 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P6aZU kmuehlbauer 5821660 2022-12-07T10:20:40Z 2022-12-07T10:20:40Z MEMBER

@jerabaul29 Concerning possible slowness of reindex, I think it uses some sorting inside. But isn't a timeseries sorted anyway? Nevertheless you have a point here, that reindex might not be the right tool for this use-case. It would be nice of we could create your dataset in memory with some random data and check the different proposed solutions for their performance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1340482904 https://github.com/pydata/xarray/issues/7363#issuecomment-1340482904 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P5iVY kmuehlbauer 5821660 2022-12-07T06:59:11Z 2022-12-07T06:59:11Z MEMBER

@jerabaul29 Does your Dataset with the 3 Million time points fit into your machine's memory? Are the arrays dask-backed? It is unfortunately not seen in the screenshots. Calculating from the sizes this is 106 x 3_208_464 single measurements -> 340_097_184. Going from float (8 byte) this will lead to 2_720_777_472, roughly 2.7GB which should fit in most setups. I'm not really sure but good chance that reindex is creating a completely new Dataset, which means the computer has to hold the origin as well as the new Dataset (which is roughly 3.2GB). This adds up to almost 6GB RAM. Depending on your machine and other tasks this might drive into RAM issues. But xarray devs will know better.

@keewis suggestion of creating and concatenating a new array with predefined values which is file-backed could resolve the issues you are currently facing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339450779 https://github.com/pydata/xarray/issues/7363#issuecomment-1339450779 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P1mWb kmuehlbauer 5821660 2022-12-06T14:13:30Z 2022-12-06T14:13:30Z MEMBER

You could take the exact time you have and just add the addition times. You even might create those additional ones by giving a timeinterval and the number . I'd need to look up , but I'm currently only on phone.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713
1339403307 https://github.com/pydata/xarray/issues/7363#issuecomment-1339403307 https://api.github.com/repos/pydata/xarray/issues/7363 IC_kwDOAMm_X85P1awr kmuehlbauer 5821660 2022-12-06T13:39:06Z 2022-12-06T13:39:33Z MEMBER

Would xarray.Dataset.reindex do what you want?

You would need to extend you time array/coordinate appropriately and feed it to reindex. Maybe you also need to provide fillvalue keywords to get your need portions filled with the correct fillvalue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand dimension by re-allocating larger arrays with more space "at the end of the corresponding dimension", block copying previously existing data, and autofill newly created entry by a default value (note: alternative to reindex, but much faster for extending large arrays along, for example, the time dimension) 1479121713

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 560.59ms · About: xarray-datasette