home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 267628781 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • alimanfoo 3
  • rabernat 1
  • shoyer 1
  • d70-t 1
  • fmaussion 1
  • stale[bot] 1

author_association 3

  • CONTRIBUTOR 4
  • MEMBER 3
  • NONE 1

issue 1

  • Low memory/out-of-core index? · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
824207037 https://github.com/pydata/xarray/issues/1650#issuecomment-824207037 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDgyNDIwNzAzNw== d70-t 6574622 2021-04-21T16:46:54Z 2021-06-15T16:18:54Z CONTRIBUTOR

I'd be interested in this kind of thing as well. :+1:

We are having long time series data, which we would like to access via opendap or zarr over HTTP. Currently, the time coordinate variable is already more than 1 GB in size, which makes loading the dataset very slow or even impossible given the limitations of the opendap server and my home internet wire. Nonetheless, we know that the timestamps are in order and reasonably close to equidistant. Thus binary search or even interpolation search should be a quick method to find the right indices.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
544816352 https://github.com/pydata/xarray/issues/1650#issuecomment-544816352 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDU0NDgxNjM1Mg== stale[bot] 26384082 2019-10-22T05:58:59Z 2019-10-22T05:58:59Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338786761 https://github.com/pydata/xarray/issues/1650#issuecomment-338786761 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODc4Njc2MQ== alimanfoo 703554 2017-10-23T20:29:41Z 2017-10-23T20:29:41Z CONTRIBUTOR

Index API sounds good.

Also I was just looking at dask.dataframe indexing, there .loc is implemented using information about index values at the boundaries of each partition (chunk). Not sure xarray should use same strategy for chunked datasets, but is another approach to avoid loading indexes into memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338779368 https://github.com/pydata/xarray/issues/1650#issuecomment-338779368 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODc3OTM2OA== shoyer 1217238 2017-10-23T20:02:12Z 2017-10-23T20:02:12Z MEMBER

This should be easier after the index/coordinates separation envisioned in https://github.com/pydata/xarray/issues/1603. We could potentially define a basic index API (based on what we currently use from pandas) and allow alternative index implementations. There are certainly other use cases where go beyond pandas makes sense -- a KDTree for indexing geospatial data is one obvious example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338687376 https://github.com/pydata/xarray/issues/1650#issuecomment-338687376 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODY4NzM3Ng== alimanfoo 703554 2017-10-23T14:58:59Z 2017-10-23T14:58:59Z CONTRIBUTOR

It looks like #1017 is about having no index at all. I want indexes, but I want to avoid loading all coordinate values into memory.

On Mon, Oct 23, 2017 at 1:47 PM, Fabien Maussion notifications@github.com wrote:

Has anyone considered implementing an index for monotonic data that does not require loading all values into main memory?

But this is already the case? #1017 https://github.com/pydata/xarray/pull/1017

With on file datasets I think it is sufficient to drop_variables when opening the dataset in order not to parse the coordinates:

ds = xr.open_dataset(f, drop_variables=['lon', 'lat'])

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1650#issuecomment-338647540, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QsbZ81N2pKybO1sFHVHK0KTk1aELks5svIrJgaJpZM4QCq62 .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338662290 https://github.com/pydata/xarray/issues/1650#issuecomment-338662290 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODY2MjI5MA== rabernat 1197350 2017-10-23T13:40:53Z 2017-10-23T13:40:53Z MEMBER

This is related to the performance issue documented in #1385.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338647540 https://github.com/pydata/xarray/issues/1650#issuecomment-338647540 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODY0NzU0MA== fmaussion 10050469 2017-10-23T12:47:02Z 2017-10-23T12:47:11Z MEMBER

Has anyone considered implementing an index for monotonic data that does not require loading all values into main memory?

But this is already the case? See https://github.com/pydata/xarray/pull/1017

With on file datasets I think it is sufficient to drop_variables when opening the dataset in order not to parse the coordinates:

ds = xr.open_dataset(f, drop_variables=['lon', 'lat'])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781
338627454 https://github.com/pydata/xarray/issues/1650#issuecomment-338627454 https://api.github.com/repos/pydata/xarray/issues/1650 MDEyOklzc3VlQ29tbWVudDMzODYyNzQ1NA== alimanfoo 703554 2017-10-23T11:19:30Z 2017-10-23T11:19:30Z CONTRIBUTOR

Just to add a further thought, which is that the upper levels of the binary search tree could be be cached to get faster performance for repeated searches.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Low memory/out-of-core index? 267628781

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 559.083ms · About: xarray-datasette