id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 376370028,MDU6SXNzdWUzNzYzNzAwMjg=,2534,to_dataframe() excessive memory usage,1665346,closed,0,,,3,2018-11-01T12:20:39Z,2022-05-01T22:04:51Z,2022-05-01T22:04:43Z,NONE,,,,"#### Code Sample, a copy-pastable example if possible ```python import xarray as xr from glob import glob # This refers to a large multi-file NetCDF dataset file_list = sorted(glob('~/Data/**/**/*.nc')) dataset = xr.open_mfdataset(file_list, decode_times=True, autoclose=True, decode_cf=True, cache=False, concat_dim='time') # At this point, the total RAM used by the python process is ~1.4G # Select a timeseries at a single point # This is near instantaneous and uses no additional memory ts = dataset.sel({'lat': 10, 'lon': 10}, method='nearest') # Convert that timeseries to a pandas dataframe. # This is where the actual data reading happens, and reads the data into memory df = ts.to_dataframe() # At this point, the total RAM used by the python process is ~10.5G ``` #### Problem description Despite the fact that the resulting dataframe only has a single lat/lon point's worth of data, a huge amount of RAM is used. I can get (what appears to be) an identical pandas DataFrame by changing the final line to: ```python df = (ts * 1.0).to_dataframe() ``` which reduces the total RAM to ~2.2G (i.e. 0.6G additional RAM for that single line vs 9G additional RAM). No type conversion is taking place (i.e. `ts` and `ts * 1.0` both have identical data types) #### Expected Output I would expect that `to_dataframe()` would require the same amount of memory whether or not it was multiplied by 1.0. I'm aware there could be a good reason for this, but it took me by surprise somewhat. #### Output of ``xr.show_versions()``
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-36-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.1 numpy: 1.13.3 scipy: 0.17.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.19.0 distributed: None matplotlib: 1.5.1 cartopy: None seaborn: 0.8.1 setuptools: 20.7.0 pip: 18.0 conda: None pytest: None IPython: 2.4.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2534/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue