home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "CONTRIBUTOR", issue = 1475567394 and user = 90008 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • hmaarrfk · 10 ✖

issue 1

  • Avoid loading entire dataset by getting the nbytes in an array · 10 ✖

author_association 1

  • CONTRIBUTOR · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1362322800 https://github.com/pydata/xarray/pull/7356#issuecomment-1362322800 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85RM2Vw hmaarrfk 90008 2022-12-22T02:40:59Z 2022-12-22T02:40:59Z CONTRIBUTOR

Any chance of a release, this is quite breaking for large datasets that can only be out of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1346924547 https://github.com/pydata/xarray/pull/7356#issuecomment-1346924547 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85QSHAD hmaarrfk 90008 2022-12-12T17:27:47Z 2022-12-12T17:27:47Z CONTRIBUTOR

👍🏾

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339624818 https://github.com/pydata/xarray/pull/7356#issuecomment-1339624818 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P2Q1y hmaarrfk 90008 2022-12-06T16:19:19Z 2022-12-06T16:19:19Z CONTRIBUTOR

Yes, without chunks of anything

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339624418 https://github.com/pydata/xarray/pull/7356#issuecomment-1339624418 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P2Qvi hmaarrfk 90008 2022-12-06T16:18:59Z 2022-12-06T16:18:59Z CONTRIBUTOR

Very smart test!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339457617 https://github.com/pydata/xarray/pull/7356#issuecomment-1339457617 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P1oBR hmaarrfk 90008 2022-12-06T14:18:11Z 2022-12-06T14:18:11Z CONTRIBUTOR

The data is loaded from an NetCDF store through open_dataset

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339452942 https://github.com/pydata/xarray/pull/7356#issuecomment-1339452942 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P1m4O hmaarrfk 90008 2022-12-06T14:14:57Z 2022-12-06T14:14:57Z CONTRIBUTOR

No explicit test was added to ensure that the data wasn't loaded. I just experienced this bug enough (we would accidentally load 100GB files in our code base) that I knew exactly how to fix it.

If you want i can add a test to ensure that future optimizations to nbytes do not trigger a data load.

I was hoping the 1 line fix would be a shoe in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336731702 https://github.com/pydata/xarray/pull/7356#issuecomment-1336731702 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrOg2 hmaarrfk 90008 2022-12-05T04:20:08Z 2022-12-05T04:20:08Z CONTRIBUTOR

It seems that checking hasattr on the _data variable achieves both purposes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336711830 https://github.com/pydata/xarray/pull/7356#issuecomment-1336711830 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrJqW hmaarrfk 90008 2022-12-05T03:58:50Z 2022-12-05T03:58:50Z CONTRIBUTOR

I think that at the very lease, the current implementation works as well as the old one for arrays that are defined by the sparse package.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336700669 https://github.com/pydata/xarray/pull/7356#issuecomment-1336700669 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrG79 hmaarrfk 90008 2022-12-05T03:36:31Z 2022-12-05T03:36:31Z CONTRIBUTOR

Looking into the history a little more. I seem to be proposing to revert: https://github.com/pydata/xarray/commit/60f8c3d3488d377b0b21009422c6121e1c8f1f70

I think this is important since many users have arrays that are larger than memory. For me, I found this bug when trying to access the number of bytes in a 16GB dataset that I'm trying to load on my wimpy laptop. Not fun to start swapping. I feel like others might be hitting this too.

xref: https://github.com/pydata/xarray/pull/6797 https://github.com/pydata/xarray/issues/4842

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336696899 https://github.com/pydata/xarray/pull/7356#issuecomment-1336696899 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrGBD hmaarrfk 90008 2022-12-05T03:30:31Z 2022-12-05T03:30:31Z CONTRIBUTOR

I personally do not even think the hasattr is really that useful. You might as well use size and itemsize

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1435.062ms · About: xarray-datasette