issues: 1197287512
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1197287512 | I_kwDOAMm_X85HXShY | 6457 | Memory error after slicing xarray dataset x and y dimensions | 42190773 | closed | 1 | 0 | 2022-04-08T12:58:37Z | 2022-04-08T14:50:15Z | 2022-04-08T14:50:15Z | NONE | What is your issue?Hello, Hope i'm doing this right. I am opening 20 years of daily precipitation data from May to October on a 881x1201 grid. Each file contains one year of data. The setup is the following ```python ds= xr.open_mfdataset(filenames,parallel=True) print(ds) <xarray.Dataset> Dimensions: (time: 7518, y: 881, x: 1201) Coordinates: * time (time) datetime64[ns] 2000-04-01 ... 2020-10-30 lat (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> * y (y) float32 1.268e+03 1.267e+03 ... 389.0 388.0 * x (x) float32 1.409e+03 1.41e+03 ... 2.609e+03 lon (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> Data variables: lambert_conformal_conic (time) float64 -3.277e+04 -3.277e+04 ... -3.277e+04 prcp (time, y, x) float32 dask.array<chunksize=(1, 881, 1201), meta=np.ndarray> Attributes: start_year: 2000 source: Daymet Software Version 4.0 Version_software: Daymet Software Version 4.0 Version_data: Daymet Data Version 4.0 Conventions: CF-1.6 citation: Please see http://daymet.ornl.gov/ for current Dayme... references: Please see http://daymet.ornl.gov/ for current infor... History: Translated to CF-1.0 Conventions by Netcdf-Java CDM ... geospatial_lat_min: 41.149529826818124 geospatial_lat_max: 52.67908572537513 geospatial_lon_min: -81.22510372987668 geospatial_lon_max: -61.64963532370754 ``` I need to compute various indices and I like looking at the values from the data to make sure that the code is working well. To reduce the computational load during these verifications, I select only a specific x, y index, such as follows : ```python ds_small = ds['prcp'].isel(y = 0, x = 0) print(ds_small) <xarray.DataArray 'prcp' (time: 7518)>
dask.array<getitem, shape=(7518,), dtype=float32, chunksize=(154,), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2000-04-01 2000-04-02 ... 2020-10-30
lat (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
y float32 1.268e+03
x float32 1.409e+03
lon (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
Attributes:
long_name: daily total precipitation
units: mm/day
grid_mapping: lambert_conformal_conic
cell_methods: area: mean time: sum
_ChunkSizes: [ 1 1000 1000]
```python-traceback C:\temp\Lib\site-packages\dask\array\numpy_compat.py:39: RuntimeWarning: invalid value encountered in true_divide x = np.divide(x1, x2, out) Traceback (most recent call last): File "C:\temp\Lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-40-97898f43589b>", line 1, in <module> ds_small.compute() File "C:\temp\Lib\site-packages\xarray\core\dataarray.py", line 955, in compute return new.load(kwargs) File "C:\temp\Lib\site-packages\xarray\core\dataarray.py", line 929, in load ds = self._to_temp_dataset().load(kwargs) File "C:\temp\Lib\site-packages\xarray\core\dataset.py", line 865, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "C:\temp\Lib\site-packages\dask\base.py", line 568, in compute results = schedule(dsk, keys, kwargs) File "C:\temp\Lib\site-packages\dask\threaded.py", line 79, in get results = get_async( File "C:\temp\Lib\site-packages\dask\local.py", line 517, in get_async raise_exception(exc, tb) File "C:\temp\Lib\site-packages\dask\local.py", line 325, in reraise raise exc File "C:\temp\Lib\site-packages\dask\local.py", line 223, in execute_task result = _execute_task(task, data) File "C:\temp\Lib\site-packages\dask\core.py", line 121, in _execute_task return func((execute_task(a, cache) for a in args)) File "C:\temp\Lib\site-packages\dask\array\core.py", line 105, in getter c = np.asarray(c) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 354, in __array__ return np.asarray(self.array, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 518, in __array__ return np.asarray(self.array, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 419, in __array__ return np.asarray(array[self.key], dtype=None) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\coding\variables.py", line 70, in __array__ return self.func(self.array) File "C:\temp\Lib\site-packages\xarray\coding\variables.py", line 137, in _apply_mask data = np.asarray(data, dtype=dtype) File "C:\temp\Lib\site-packages\numpy\core_asarray.py", line 102, in asarray return array(a, dtype, copy=False, order=order) File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 419, in __array__ return np.asarray(array[self.key], dtype=None) File "C:\temp\Lib\site-packages\xarray\backends\netCDF4.py", line 91, in getitem return indexing.explicit_indexing_adapter( File "C:\temp\Lib\site-packages\xarray\core\indexing.py", line 710, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "C:\temp\Lib\site-packages\xarray\backends\netCDF4_.py", line 104, in _getitem array = getitem(original_array, key) File "netCDF4_netCDF4.pyx", line 4451, in netCDF4._netCDF4.Variable.getitem File "netCDF4_netCDF4.pyx", line 5376, in netCDF4._netCDF4.Variable._get numpy.core._exceptions._ArrayMemoryError: Unable to allocate 860. MiB for an array with shape (213, 881, 1201) and data type float32 ``` But this doesn't make sense to me, why would xarray still think I have a 881,1201 dataset shape after I've selected specific indices (in this case,
`INSTALLED VERSIONS
------------------
commit: None
python: 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('French_Canada', '1252')
libhdf5: 1.10.6
libnetcdf: 4.7.3
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.7.1
netCDF4: 1.5.5.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.3.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: 0.9.8.1
iris: None
bottleneck: 1.3.2
dask: 2021.08.1
distributed: None
matplotlib: 3.4.3
cartopy: 0.19.0.post1
seaborn: 0.11.2
numbagg: None
pint: 0.17
setuptools: 57.4.0
pip: 22.0.4
conda: None
pytest: None
IPython: 7.27.0
sphinx: None
`
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6457/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |