home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1676561243 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 5

  • dabhicusp 2
  • dcherian 1
  • keewis 1
  • welcome[bot] 1
  • TomNicholas 1

author_association 2

  • MEMBER 3
  • NONE 3

issue 1

  • Process getting killed due to high memory consumption of xarray's nbytes method · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1519897098 https://github.com/pydata/xarray/issues/7772#issuecomment-1519897098 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85al8oK dabhicusp 123355381 2023-04-24T10:51:16Z 2023-04-24T10:51:16Z NONE

Thank you @dcherian . I cannot reproduced this on main.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1518429926 https://github.com/pydata/xarray/issues/7772#issuecomment-1518429926 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85agWbm dcherian 2448579 2023-04-21T23:56:26Z 2023-04-21T23:56:26Z MEMBER

I cannot reproduce this on main. What version are you running

``` (xarray-tests) 17:55:11 [cgdm-caguas] {~/python/xarray/devel} ──────> python lazy-nbytes.py 8582842640 Filename: /Users/dcherian/work/python/xarray/devel/lazy-nbytes.py

Line # Mem usage Increment Occurrences Line Contents

 4    101.5 MiB    101.5 MiB           1   @profile
 5                                         def get_dataset_size() :
 6    175.9 MiB     74.4 MiB           1       dataset =     xa.open_dataset("test_1.nc")
 7    175.9 MiB      0.0 MiB           1       print(dataset.nbytes)

```

The BackendArray types define shape and dtype so we can calculate size without loading the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1517659721 https://github.com/pydata/xarray/issues/7772#issuecomment-1517659721 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85adaZJ keewis 14808389 2023-04-21T11:05:40Z 2023-04-21T11:05:40Z MEMBER

that's a numpy array with sparse data. What @TomNicholas was talking about is a array of type sparse.COO (from the sparse package).

And as far as I can tell, our wrapper class (which is the reason why you don't get the memory error on open) does not define nbytes, so at the moment there's no way to do that. You could try using dask, though, which does allow working with bigger-than-memory data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1517649648 https://github.com/pydata/xarray/issues/7772#issuecomment-1517649648 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85adX7w dabhicusp 123355381 2023-04-21T10:57:28Z 2023-04-21T10:57:28Z NONE

The first point that you mentioned does not seem to be correct. Please see the below code (we took the sparse matrix ) and output: ``` import xarray as xa import numpy as np

def get_data(): lat_dim = 7210 lon_dim = 7440

lat = [0] * lat_dim
lon = [0] * lon_dim 
time = [0] * 5

nlats = lat_dim; nlons = lon_dim; ntimes = 5

var_1 = np.empty((ntimes,nlats,nlons))
var_2 = np.empty((ntimes,nlats,nlons))
var_3 = np.empty((ntimes,nlats,nlons))
var_4 = np.empty((ntimes,nlats,nlons))

data_arr = np.random.uniform(low=0,high=0,size=(ntimes,nlats,nlons))
data_arr[:,0,:] = 1
data_arr[:,:,1] = 1

var_1[:,:,:] = data_arr 
var_2[:,:,:] = data_arr 
var_3[:,:,:] = data_arr 
var_4[:,:,:] = data_arr

dataset = xa.Dataset( 
        data_vars = {
            'var_1': (('time','lat','lon'), var_1),
            'var_2': (('time','lat','lon'), var_2),
            'var_3': (('time','lat','lon'), var_3),
            'var_4': (('time','lat','lon'), var_4)},
        coords = {
            'lat': lat,
            'lon': lon,
            'time':time})

print(sum(v.size * v.dtype.itemsize for v in dataset.variables.values()))
print(dataset.nbytes)

if name == "main": get_data() ```

8582901240 8582901240 As we can observe here both nbytes and self.size * self.dtype.itemsize gives the same size.

And for the 2nd point can you share any solution for the nbytes for the netCDF or grib file as it takes too much memory and killed the process?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1516802286 https://github.com/pydata/xarray/issues/7772#issuecomment-1516802286 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85aaJDu TomNicholas 35968931 2023-04-20T18:58:48Z 2023-04-20T18:58:48Z MEMBER

Thanks for raising this @dabhicusp !

So why have that if block at line 396?

Because xarray can wrap many different type of numpy-like arrays, and for some of those types then the self.size * self.dtype.itemsize approach may not return the correct size. Think of a sparse matrix for example - its size in memory is designed to be much smaller than the size of the matrix would suggest. That's why in general we defer to the underlying array itself to tell us its size if it can (i.e. if it has a .nbytes attribute).

But you're not using an unusual type of array, you're just opening a netCDF file as a numpy array, in theory lazily. The memory usage you're seeing is not desired, so something weird must be happening in the .nbytes call. Going deeper into the stack at that point would be helpful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1516188394 https://github.com/pydata/xarray/issues/7772#issuecomment-1516188394 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85aXzLq welcome[bot] 30606887 2023-04-20T11:46:04Z 2023-04-20T11:46:04Z NONE

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.957ms · About: xarray-datasette