home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1223031600

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1223031600 I_kwDOAMm_X85I5fsw 6561 Excessive memory consumption by to_dataframe() 8419421 closed 0     4 2022-05-02T15:33:33Z 2023-12-15T20:47:32Z 2023-12-15T20:47:32Z NONE      

What happened?

This is a reincarnation of #2534 with a reproduceable example.

A 51 MB netCDF file leads to to_dataframe() requesting 23 GB.

What did you expect to happen?

I expect to_dataframe() to require much less than 23 GB of memory for this operation.

Minimal Complete Verifiable Example

```Python import urllib.request import xarray as xr

url = 'http://people.envsci.rutgers.edu/decker/Surface_METAR_20220501_0000.nc' fname = 'metar.nc' urllib.request.urlretrieve(url, filename=fname) ncdata = xr.open_dataset(fname) df = ncdata.to_dataframe() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "/chariton/decker/test/bug/xarraymem.py", line 8, in <module> df = ncdata.to_dataframe() File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5399, in to_dataframe return self._to_dataframe(ordered_dims=ordered_dims) File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5363, in _to_dataframe data = [ File "/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/xarray/core/dataset.py", line 5364, in <listcomp> self._variables[k].set_dims(ordered_dims).values.reshape(-1) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 23.3 GiB for an array with shape (5021, 127626) and data type |S39

Anything else we need to know?

No response

Environment

/home/decker/local/miniconda3/envs/xarraybug/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.62.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6561/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 1.428ms · About: xarray-datasette