home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1197287512

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1197287512 I_kwDOAMm_X85HXShY 6457 Memory error after slicing xarray dataset x and y dimensions 42190773 closed 1     0 2022-04-08T12:58:37Z 2022-04-08T14:50:15Z 2022-04-08T14:50:15Z NONE      

What is your issue?

Hello,

Hope i'm doing this right.

I am opening 20 years of daily precipitation data from May to October on a 881x1201 grid. Each file contains one year of data. The setup is the following

```python ds= xr.open_mfdataset(filenames,parallel=True) print(ds)

<xarray.Dataset> Dimensions: (time: 7518, y: 881, x: 1201) Coordinates: * time (time) datetime64[ns] 2000-04-01 ... 2020-10-30 lat (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> * y (y) float32 1.268e+03 1.267e+03 ... 389.0 388.0 * x (x) float32 1.409e+03 1.41e+03 ... 2.609e+03 lon (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> Data variables: lambert_conformal_conic (time) float64 -3.277e+04 -3.277e+04 ... -3.277e+04 prcp (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> Attributes: start_year: 2000 source: Daymet Software Version 4.0 Version_software: Daymet Software Version 4.0 Version_data: Daymet Data Version 4.0 Conventions: CF-1.6 citation: Please see http://daymet.ornl.gov/ for current Dayme... references: Please see http://daymet.ornl.gov/ for current infor... History: Translated to CF-1.0 Conventions by Netcdf-Java CDM ... geospatial_lat_min: 41.149529826818124 geospatial_lat_max: 52.67908572537513 geospatial_lon_min: -81.22510372987668 geospatial_lon_max: -61.64963532370754 ```

I need to compute various indices and I like looking at the values from the data to make sure that the code is working well. To reduce the computational load during these verifications, I select only a specific x, y index, such as follows :

```python ds_small = ds['prcp'].isel(y = 0, x = 0) print(ds_small)

<xarray.DataArray 'prcp' (time: 7518)> dask.array<getitem, shape=(7518,), dtype=float32, chunksize=(154,), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2000-04-01 2000-04-02 ... 2020-10-30 lat (time) float32 dask.array<chunksize=(1,), meta=np.ndarray> y float32 1.268e+03 x float32 1.409e+03 lon (time) float32 dask.array<chunksize=(1,), meta=np.ndarray> Attributes: long_name: daily total precipitation units: mm/day grid_mapping: lambert_conformal_conic cell_methods: area: mean time: sum _ChunkSizes: [ 1 1000 1000] `` However, when comes the time where I want to run manipulations of numpy or panda (may that be from using.to_numpy(), .to_dataframe(), .compute(), .load()`), the process will run very slowly and the following memory error will appear :

python ds_small.compute()

```python-traceback

C:\temp\Lib\site-packages\dask\array\numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide x = np.divide(x1, x2, out)

Traceback (most recent call last): File "C:\temp\Lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-40-97898f43589b>", line 1, in <module> ds_small.compute() File "C:\temp\Lib\site-packages\xarray\core\dataarray.py", line 955, in compute return new.load(kwargs) File "C:\temp\Lib\site-packages\xarray\core\dataarray.py", line 929, in load ds = self._to_temp_dataset().load(kwargs) File "C:\temp\Lib\site-packages\xarray\core\dataset.py", line 865, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\temp\Lib\site-packages\dask\base.py", line 568, in compute results = schedule(dsk, keys, kwargs) File "C:\temp\Lib\site-packages\dask\threaded.py", line 79, in get results = get_async( File "C:\temp\Lib\site-packages\dask\local.py", line 517, in get_async raise_exception(exc, tb) File "C:\temp\Lib\site-packages\dask\local.py", line 325, in reraise raise exc File "C:\temp\Lib\site-packages\dask\local.py", line 223, in execute_task result = _execute_task(task, data) File "C:\temp\Lib\site-packages\dask\core.py", line 121, in _execute_task return func((execute_task(a, cache) for a in args)) File "C:\temp\Lib\site-packages\dask\array\core.py", line 105, in getter c = np.asarray(c) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 354, in __array__ return np.asarray(self.array, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 518, in __array__ return np.asarray(self.array, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 419, in __array__ return np.asarray(array[self.key], dtype=None) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\coding\variables.py", line 70, in __array__ return self.func(self.array) File "C:\temp\Lib\site-packages\xarray\coding\variables.py", line 137, in _apply_mask data = np.asarray(data, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 419, in __array__ return np.asarray(array[self.key], dtype=None) File "C:\temp\Lib\site-packages\xarray\backends\netCDF4.py", line 91, in getitem return indexing.explicit_indexing_adapter( File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 710, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\temp\Lib\site-packages\xarray\backends\netCDF4_.py", line 104, in _getitem array = getitem(original_array, key) File "netCDF4_netCDF4.pyx", line 4451, in netCDF4._netCDF4.Variable.getitem File "netCDF4_netCDF4.pyx", line 5376, in netCDF4._netCDF4.Variable._get numpy.core._exceptions._ArrayMemoryError: Unable to allocate 860. MiB for an array with shape (213, 881, 1201) and data type float32 ```

But this doesn't make sense to me, why would xarray still think I have a 881,1201 dataset shape after I've selected specific indices (in this case, x = 0 and y = 0)? I feel like i'm not understanding something trivial about xarray and I'd greatly appreciate any help!

`INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('French_Canada', '1252') libhdf5: 1.10.6 libnetcdf: 4.7.3 xarray: 0.19.0 pandas: 1.3.2 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.8.1 iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: None matplotlib: 3.4.3 cartopy: 0.19.0.post1 seaborn: 0.11.2 numbagg: None pint: 0.17 setuptools: 57.4.0 pip: 22.0.4 conda: None pytest: None IPython: 7.27.0 sphinx: None `
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6457/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 156.393ms · About: xarray-datasette