html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7772#issuecomment-1519897098,https://api.github.com/repos/pydata/xarray/issues/7772,1519897098,IC_kwDOAMm_X85al8oK,123355381,2023-04-24T10:51:16Z,2023-04-24T10:51:16Z,NONE,Thank you @dcherian . I cannot reproduced this on `main`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243 https://github.com/pydata/xarray/issues/7772#issuecomment-1518429926,https://api.github.com/repos/pydata/xarray/issues/7772,1518429926,IC_kwDOAMm_X85agWbm,2448579,2023-04-21T23:56:26Z,2023-04-21T23:56:26Z,MEMBER,"I cannot reproduce this on `main`. What version are you running ``` (xarray-tests) 17:55:11 [cgdm-caguas] {~/python/xarray/devel} ──────> python lazy-nbytes.py 8582842640 Filename: /Users/dcherian/work/python/xarray/devel/lazy-nbytes.py Line # Mem usage Increment Occurrences Line Contents ============================================================= 4 101.5 MiB 101.5 MiB 1 @profile 5 def get_dataset_size() : 6 175.9 MiB 74.4 MiB 1 dataset = xa.open_dataset(""test_1.nc"") 7 175.9 MiB 0.0 MiB 1 print(dataset.nbytes) ``` The BackendArray types define `shape` and `dtype` so we can calculate size without loading the data.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243 https://github.com/pydata/xarray/issues/7772#issuecomment-1517659721,https://api.github.com/repos/pydata/xarray/issues/7772,1517659721,IC_kwDOAMm_X85adaZJ,14808389,2023-04-21T11:05:40Z,2023-04-21T11:05:40Z,MEMBER,"that's a numpy array with sparse data. What @TomNicholas was talking about is a array of type `sparse.COO` (from the [sparse](https://github.com/pydata/sparse/) package). And as far as I can tell, our wrapper class (which is the reason why you don't get the memory error on open) does not define `nbytes`, so at the moment there's no way to do that. You could try using `dask`, though, which does allow working with bigger-than-memory data.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243 https://github.com/pydata/xarray/issues/7772#issuecomment-1517649648,https://api.github.com/repos/pydata/xarray/issues/7772,1517649648,IC_kwDOAMm_X85adX7w,123355381,2023-04-21T10:57:28Z,2023-04-21T10:57:28Z,NONE,"The first point that you mentioned does not seem to be correct. Please see the below code (we took the sparse matrix ) and output: ``` import xarray as xa import numpy as np def get_data(): lat_dim = 7210 lon_dim = 7440 lat = [0] * lat_dim lon = [0] * lon_dim time = [0] * 5 nlats = lat_dim; nlons = lon_dim; ntimes = 5 var_1 = np.empty((ntimes,nlats,nlons)) var_2 = np.empty((ntimes,nlats,nlons)) var_3 = np.empty((ntimes,nlats,nlons)) var_4 = np.empty((ntimes,nlats,nlons)) data_arr = np.random.uniform(low=0,high=0,size=(ntimes,nlats,nlons)) data_arr[:,0,:] = 1 data_arr[:,:,1] = 1 var_1[:,:,:] = data_arr var_2[:,:,:] = data_arr var_3[:,:,:] = data_arr var_4[:,:,:] = data_arr dataset = xa.Dataset( data_vars = { 'var_1': (('time','lat','lon'), var_1), 'var_2': (('time','lat','lon'), var_2), 'var_3': (('time','lat','lon'), var_3), 'var_4': (('time','lat','lon'), var_4)}, coords = { 'lat': lat, 'lon': lon, 'time':time}) print(sum(v.size * v.dtype.itemsize for v in dataset.variables.values())) print(dataset.nbytes) if __name__ == ""__main__"": get_data() ``` ``` 8582901240 8582901240 ``` As we can observe here both `nbytes` and `self.size * self.dtype.itemsize` gives the same size. And for the 2nd point can you share any solution for the nbytes for the `netCDF` or `grib` file as it takes too much memory and killed the process? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243 https://github.com/pydata/xarray/issues/7772#issuecomment-1516802286,https://api.github.com/repos/pydata/xarray/issues/7772,1516802286,IC_kwDOAMm_X85aaJDu,35968931,2023-04-20T18:58:48Z,2023-04-20T18:58:48Z,MEMBER,"Thanks for raising this @dabhicusp ! > So why have that if block at line 396? Because xarray can wrap many different type of numpy-like arrays, and for some of those types then the `self.size * self.dtype.itemsize` approach may not return the correct size. Think of a sparse matrix for example - its size in memory is designed to be much smaller than the size of the matrix would suggest. That's why in general we defer to the underlying array itself to tell us its size if it can (i.e. if it has a `.nbytes` attribute). But you're not using an unusual type of array, you're just opening a netCDF file as a numpy array, in theory lazily. The memory usage you're seeing is not desired, so something weird must be happening in the `.nbytes` call. Going deeper into the stack at that point would be helpful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243 https://github.com/pydata/xarray/issues/7772#issuecomment-1516188394,https://api.github.com/repos/pydata/xarray/issues/7772,1516188394,IC_kwDOAMm_X85aXzLq,30606887,2023-04-20T11:46:04Z,2023-04-20T11:46:04Z,NONE,"Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the [Contributing Guide](https://docs.xarray.dev/en/latest/contributing.html) for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1676561243