home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 1676561243 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • dcherian 1
  • keewis 1
  • TomNicholas 1

issue 1

  • Process getting killed due to high memory consumption of xarray's nbytes method · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1518429926 https://github.com/pydata/xarray/issues/7772#issuecomment-1518429926 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85agWbm dcherian 2448579 2023-04-21T23:56:26Z 2023-04-21T23:56:26Z MEMBER

I cannot reproduce this on main. What version are you running

``` (xarray-tests) 17:55:11 [cgdm-caguas] {~/python/xarray/devel} ──────> python lazy-nbytes.py 8582842640 Filename: /Users/dcherian/work/python/xarray/devel/lazy-nbytes.py

Line # Mem usage Increment Occurrences Line Contents

 4    101.5 MiB    101.5 MiB           1   @profile
 5                                         def get_dataset_size() :
 6    175.9 MiB     74.4 MiB           1       dataset =     xa.open_dataset("test_1.nc")
 7    175.9 MiB      0.0 MiB           1       print(dataset.nbytes)

```

The BackendArray types define shape and dtype so we can calculate size without loading the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1517659721 https://github.com/pydata/xarray/issues/7772#issuecomment-1517659721 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85adaZJ keewis 14808389 2023-04-21T11:05:40Z 2023-04-21T11:05:40Z MEMBER

that's a numpy array with sparse data. What @TomNicholas was talking about is a array of type sparse.COO (from the sparse package).

And as far as I can tell, our wrapper class (which is the reason why you don't get the memory error on open) does not define nbytes, so at the moment there's no way to do that. You could try using dask, though, which does allow working with bigger-than-memory data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1516802286 https://github.com/pydata/xarray/issues/7772#issuecomment-1516802286 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85aaJDu TomNicholas 35968931 2023-04-20T18:58:48Z 2023-04-20T18:58:48Z MEMBER

Thanks for raising this @dabhicusp !

So why have that if block at line 396?

Because xarray can wrap many different type of numpy-like arrays, and for some of those types then the self.size * self.dtype.itemsize approach may not return the correct size. Think of a sparse matrix for example - its size in memory is designed to be much smaller than the size of the matrix would suggest. That's why in general we defer to the underlying array itself to tell us its size if it can (i.e. if it has a .nbytes attribute).

But you're not using an unusual type of array, you're just opening a netCDF file as a numpy array, in theory lazily. The memory usage you're seeing is not desired, so something weird must be happening in the .nbytes call. Going deeper into the stack at that point would be helpful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.245ms · About: xarray-datasette