id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1368696980,I_kwDOAMm_X85RlKiU,7018,Writing netcdf after running xarray.dataset.reindex to fill gaps in a time series fails due to memory allocation error,64621312,open,0,,,3,2022-09-10T18:21:48Z,2022-09-15T19:59:39Z,,NONE,,,,"# Problem Summary
I am attempting to convert a.grib2 file representing a single day's worth of gridded radar rainfall data spanning the continental US, into a netcdf. When a .grib2 is missing timesteps, I am attempting to fill them in with NA values using `xarray.Dataset.reindex` before running `xarray.Dataset.to_netcdf`. However, after I've reindexed the dataset, the script fails due to a memory allocation error. It succeeds if I don't reindex. One clue could be in the fact that the dataset chunks are set to `(70, 3500, 7000)`, but when `ds.to_netcdf` is called, the script fails because it's attempting to load a chunk with dimensions `(210, 3500, 7000)`. 

# Accessing Full Reproducible Example
The code and data to reproduce my results can be downloaded from [this Dropbox link](https://www.dropbox.com/sh/w31kpx2u13ymg3j/AAB6Gzf6fqetgk1FViRbKm2Ba?dl=0). The code is also shown below followed by the outputs. Potentially relevant OS and environment information are shown below as well.

# Code
```python
#%% Import libraries
import time
start_time = time.time()
import xarray as xr
import cfgrib
from glob import glob
import pandas as pd
import dask
dask.config.set(**{'array.slicing.split_large_chunks': False}) # to silence warnings of loading large slice into memory
dask.config.set(scheduler='synchronous') # this forces single threaded computations (netcdfs can only be written serially)
#%% parameters
chnk_sz = ""7000MB""
fl_out_nc = ""out_netcdfs/20010101.nc""
fldr_in_grib = ""in_gribs/20010101.grib2""

#%% loading and exporting dataset
ds = xr.open_dataset(fldr_in_grib, engine=""cfgrib"", chunks={""time"":chnk_sz},
                    backend_kwargs={'indexpath': ''})

# reindex
start_date = pd.to_datetime('2001-01-01')
tstep = pd.Timedelta('0 days 00:05:00')
new_index = pd.date_range(start=start_date, end=start_date + pd.Timedelta(1, ""day""),\
                                    freq=tstep, inclusive='left')

ds = ds.reindex(indexers={""time"":new_index})
ds = ds.unify_chunks()
ds = ds.chunk(chunks={'time':chnk_sz})

print(""######## INSPECTING DATASET PRIOR TO WRITING TO NETCDF ########"")
print(ds)
print(' ')
print(""######## ERROR MESSAGE ########"")
ds.to_netcdf(fl_out_nc, encoding= {""unknown"":{""zlib"":True}})
```


# Outputs
```
######## INSPECTING DATASET PRIOR TO WRITING TO NETCDF ########
<xarray.Dataset>
Dimensions:     (time: 288, latitude: 3500, longitude: 7000)
Coordinates:
  * time        (time) datetime64[ns] 2001-01-01 ... 2001-01-01T23:55:00
  * latitude    (latitude) float64 54.99 54.98 54.98 54.97 ... 20.03 20.02 20.01
  * longitude   (longitude) float64 230.0 230.0 230.0 ... 300.0 300.0 300.0
    step        timedelta64[ns] ...
    surface     float64 ...
    valid_time  (time) datetime64[ns] dask.array<chunksize=(288,), meta=np.ndarray>
Data variables:
    unknown     (time, latitude, longitude) float32 dask.array<chunksize=(70, 3500, 7000), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             161
    GRIB_centreDescription:  161
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             161
    history:                 2022-09-10T14:50 GRIB to CDM+CF via cfgrib-0.9.1...
 
######## ERROR MESSAGE ########
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
d:\Dropbox\_Sharing\reprex\2022-9-9_writing_ncdf_fails\reprex\exporting_netcdfs_reduced.py in <cell line: 22>()
     160 print(' ')
     161 print(""######## ERROR MESSAGE ########"")
---> 162 ds.to_netcdf(fl_out_nc, encoding= {""unknown"":{""zlib"":True}})

File c:\Users\Daniel\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\core\dataset.py:1882, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1879     encoding = {}
   1880 from ..backends.api import to_netcdf
-> 1882 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   1883     self,
   1884     path,
   1885     mode=mode,
   1886     format=format,
   1887     group=group,
   1888     engine=engine,
   1889     encoding=encoding,
   1890     unlimited_dims=unlimited_dims,
   1891     compute=compute,
   1892     multifile=False,
   1893     invalid_netcdf=invalid_netcdf,
   1894 )

File c:\Users\xxxxx\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\backends\api.py:1219, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
...
    121     return arg

File <__array_function__ internals>:180, in where(*args, **kwargs)

MemoryError: Unable to allocate 19.2 GiB for an array with shape (210, 3500, 7000) and data type float32
```
# Environment
```python
windows 11 Home
xarray 2022.3.0
cfgrib 0.9.10.1
dask 2022.7.0
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7018/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue