id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 944996552,MDU6SXNzdWU5NDQ5OTY1NTI=,5604,Extremely Large Memory usage for a very small variable ,49487505,closed,0,,,15,2021-07-15T04:52:35Z,2023-09-12T15:31:13Z,2023-09-12T15:31:12Z,NONE,,,," **What happened**: Variable that takes up very little memory in actual data size uses over 1000x the memory **What you expected to happen**: It should take the order of the memory of the data size **Minimal Complete Verifiable Example**: ```python # Put your MCVE code here >>> import xarray as xr >>>a = ['./CLM/u_2021-06-01T00_regridded.nc', './CLM/u_2021-06-01T03_regridded.nc', './CLM/u_2021-06-01T06_regridded.nc', './CLM/u_2021-06-01T09_regridded.nc', './CLM/u_2021-06-01T12_regridded.nc', './CLM/u_2021-06-01T15_regridded.nc', './CLM/u_2021-06-01T18_regridded.nc', './CLM/u_2021-06-01T21_regridded.nc', './CLM/u_2021-06-02T00_regridded.nc', './CLM/u_2021-06-02T03_regridded.nc', './CLM/u_2021-06-02T06_regridded.nc', './CLM/u_2021-06-02T09_regridded.nc', './CLM/u_2021-06-02T12_regridded.nc', './CLM/u_2021-06-02T15_regridded.nc', './CLM/u_2021-06-02T18_regridded.nc', './CLM/u_2021-06-02T21_regridded.nc', './CLM/u_2021-06-03T00_regridded.nc', './CLM/u_2021-06-03T03_regridded.nc', './CLM/u_2021-06-03T06_regridded.nc', './CLM/u_2021-06-03T09_regridded.nc', './CLM/u_2021-06-03T12_regridded.nc', './CLM/u_2021-06-03T15_regridded.nc', './CLM/u_2021-06-03T18_regridded.nc', './CLM/u_2021-06-03T21_regridded.nc', './CLM/u_2021-06-04T00_regridded.nc', './CLM/u_2021-06-04T03_regridded.nc', './CLM/u_2021-06-04T06_regridded.nc', './CLM/u_2021-06-04T09_regridded.nc', './CLM/u_2021-06-04T12_regridded.nc', './CLM/u_2021-06-04T15_regridded.nc', './CLM/u_2021-06-04T18_regridded.nc', './CLM/u_2021-06-04T21_regridded.nc', './CLM/u_2021-06-05T00_regridded.nc', './CLM/u_2021-06-05T03_regridded.nc', './CLM/u_2021-06-05T06_regridded.nc', './CLM/u_2021-06-05T09_regridded.nc', './CLM/u_2021-06-05T12_regridded.nc', './CLM/u_2021-06-05T15_regridded.nc', './CLM/u_2021-06-05T18_regridded.nc', './CLM/u_2021-06-05T21_regridded.nc', './CLM/u_2021-06-06T00_regridded.nc', './CLM/u_2021-06-06T03_regridded.nc', './CLM/u_2021-06-06T06_regridded.nc', './CLM/u_2021-06-06T09_regridded.nc', './CLM/u_2021-06-06T12_regridded.nc', './CLM/u_2021-06-06T15_regridded.nc', './CLM/u_2021-06-06T18_regridded.nc', './CLM/u_2021-06-06T21_regridded.nc', './CLM/u_2021-06-07T00_regridded.nc', './CLM/u_2021-06-07T03_regridded.nc', './CLM/u_2021-06-07T06_regridded.nc', './CLM/u_2021-06-07T09_regridded.nc', './CLM/u_2021-06-07T12_regridded.nc', './CLM/u_2021-06-07T15_regridded.nc', './CLM/u_2021-06-07T18_regridded.nc', './CLM/u_2021-06-07T21_regridded.nc', './CLM/u_2021-06-08T00_regridded.nc', './CLM/u_2021-06-08T03_regridded.nc', './CLM/u_2021-06-08T06_regridded.nc'] >>> u_file = xr.open_mfdataset(a,data_vars='minimal',combine=""by_coords"",parallel=True,chunks={'eu': 1100, 'xu': 1249,'v2d_time' : 1 , 'v3d_time' : 1 , ""s_rho"" : 1},autoclose=True) >>> u_file Dimensions: (eu: 1100, s_rho: 35, v2d_time: 59, v3d_time: 59, xu: 1249) Coordinates: * v2d_time (v2d_time) timedelta64[ns] 59366 days 00:00:00 ... 59373 days 0... * v3d_time (v3d_time) timedelta64[ns] 59366 days 00:00:00 ... 59373 days 0... Dimensions without coordinates: eu, s_rho, xu Data variables: u (v3d_time, s_rho, eu, xu) float64 dask.array ubar (v2d_time, eu, xu) float64 dask.array >>> u_file.ubar.shape (59, 1100, 1249) >>> u_file.ubar.data.shape (59, 1100, 1249) >>> u_file.ubar.data.nbytes 648480800 >>> KeyboardInterrupt >>> u_file.ubar.data dask.array >>> u_file.ubar.data.compute() ``` ![image](https://user-images.githubusercontent.com/49487505/125730438-efbfc153-41f1-4e16-a372-bbd6b7d029ca.png) This image is the output of the top command in linux after executing the line with compute() **Anything else we need to know?**: The variable u is able to be written to the disk with 22Gb of memory usage, which is the expected behaviour as the variable has abou 22 Gb of data stored in those files combined. In fact, seeing that the dimension of u is 35x ubar, the file size of u_file should be only about 22 to 23Gb. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.12.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: 2021.04.0 distributed: 2021.04.1 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.3 conda: None pytest: None IPython: 7.24.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5604/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue