id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1340994913,I_kwDOAMm_X85P7fVh,6924,Memory Leakage Issue When Running to_netcdf ,64621312,closed,0,,,2,2022-08-16T23:58:17Z,2023-01-17T18:38:40Z,2023-01-17T18:38:40Z,NONE,,,,"### What is your issue?

I have a zarr file that I'd like to convert to a netcdf which is too large to fit in memory. My computer has 32GB of RAM so writing ~5.5GB chunks shouldn't be a problem. However, within seconds of running this script, my memory usage quickly tops out consuming the available ~20GB and the script fails.

Data: [Dropbox link](https://www.dropbox.com/sh/xmcz93p53n1w3ft/AACjI9EskzwKsA8sp-WmM2BFa?dl=0) to zarr file containing radar rainfall data for 6/28/2014 over the United States that is around 1.8GB in total.

Code:
```python
import xarray as xr
import zarr

fpath_zarr = ""out_zarr_20140628.zarr""

ds_from_zarr = xr.open_zarr(store=fpath_zarr, chunks={'outlat':3500, 'outlon':7000, 'time':30})

ds_from_zarr.to_netcdf(""ds_zarr_to_nc.nc"", encoding= {""rainrate"":{""zlib"":True}})
```

Outputs:
```python
MemoryError: Unable to allocate 5.48 GiB for an array with shape (30, 3500, 7000) and data type float64
```

Package versions:
```
dask                         2022.7.0
xarray                       2022.3.0
zarr                          2.8.1
```

![memory_screenshot](https://user-images.githubusercontent.com/64621312/185004542-7c91bcbc-7e7b-4656-a306-732bc1d2e9c3.jpg)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6924/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1368696980,I_kwDOAMm_X85RlKiU,7018,Writing netcdf after running xarray.dataset.reindex to fill gaps in a time series fails due to memory allocation error,64621312,open,0,,,3,2022-09-10T18:21:48Z,2022-09-15T19:59:39Z,,NONE,,,,"# Problem Summary
I am attempting to convert a.grib2 file representing a single day's worth of gridded radar rainfall data spanning the continental US, into a netcdf. When a .grib2 is missing timesteps, I am attempting to fill them in with NA values using `xarray.Dataset.reindex` before running `xarray.Dataset.to_netcdf`. However, after I've reindexed the dataset, the script fails due to a memory allocation error. It succeeds if I don't reindex. One clue could be in the fact that the dataset chunks are set to `(70, 3500, 7000)`, but when `ds.to_netcdf` is called, the script fails because it's attempting to load a chunk with dimensions `(210, 3500, 7000)`. 

# Accessing Full Reproducible Example
The code and data to reproduce my results can be downloaded from [this Dropbox link](https://www.dropbox.com/sh/w31kpx2u13ymg3j/AAB6Gzf6fqetgk1FViRbKm2Ba?dl=0). The code is also shown below followed by the outputs. Potentially relevant OS and environment information are shown below as well.

# Code
```python
#%% Import libraries
import time
start_time = time.time()
import xarray as xr
import cfgrib
from glob import glob
import pandas as pd
import dask
dask.config.set(**{'array.slicing.split_large_chunks': False}) # to silence warnings of loading large slice into memory
dask.config.set(scheduler='synchronous') # this forces single threaded computations (netcdfs can only be written serially)
#%% parameters
chnk_sz = ""7000MB""
fl_out_nc = ""out_netcdfs/20010101.nc""
fldr_in_grib = ""in_gribs/20010101.grib2""

#%% loading and exporting dataset
ds = xr.open_dataset(fldr_in_grib, engine=""cfgrib"", chunks={""time"":chnk_sz},
                    backend_kwargs={'indexpath': ''})

# reindex
start_date = pd.to_datetime('2001-01-01')
tstep = pd.Timedelta('0 days 00:05:00')
new_index = pd.date_range(start=start_date, end=start_date + pd.Timedelta(1, ""day""),\
                                    freq=tstep, inclusive='left')

ds = ds.reindex(indexers={""time"":new_index})
ds = ds.unify_chunks()
ds = ds.chunk(chunks={'time':chnk_sz})

print(""######## INSPECTING DATASET PRIOR TO WRITING TO NETCDF ########"")
print(ds)
print(' ')
print(""######## ERROR MESSAGE ########"")
ds.to_netcdf(fl_out_nc, encoding= {""unknown"":{""zlib"":True}})
```


# Outputs
```
######## INSPECTING DATASET PRIOR TO WRITING TO NETCDF ########
<xarray.Dataset>
Dimensions:     (time: 288, latitude: 3500, longitude: 7000)
Coordinates:
  * time        (time) datetime64[ns] 2001-01-01 ... 2001-01-01T23:55:00
  * latitude    (latitude) float64 54.99 54.98 54.98 54.97 ... 20.03 20.02 20.01
  * longitude   (longitude) float64 230.0 230.0 230.0 ... 300.0 300.0 300.0
    step        timedelta64[ns] ...
    surface     float64 ...
    valid_time  (time) datetime64[ns] dask.array<chunksize=(288,), meta=np.ndarray>
Data variables:
    unknown     (time, latitude, longitude) float32 dask.array<chunksize=(70, 3500, 7000), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             161
    GRIB_centreDescription:  161
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             161
    history:                 2022-09-10T14:50 GRIB to CDM+CF via cfgrib-0.9.1...
 
######## ERROR MESSAGE ########
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
d:\Dropbox\_Sharing\reprex\2022-9-9_writing_ncdf_fails\reprex\exporting_netcdfs_reduced.py in <cell line: 22>()
     160 print(' ')
     161 print(""######## ERROR MESSAGE ########"")
---> 162 ds.to_netcdf(fl_out_nc, encoding= {""unknown"":{""zlib"":True}})

File c:\Users\Daniel\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\core\dataset.py:1882, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1879     encoding = {}
   1880 from ..backends.api import to_netcdf
-> 1882 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   1883     self,
   1884     path,
   1885     mode=mode,
   1886     format=format,
   1887     group=group,
   1888     engine=engine,
   1889     encoding=encoding,
   1890     unlimited_dims=unlimited_dims,
   1891     compute=compute,
   1892     multifile=False,
   1893     invalid_netcdf=invalid_netcdf,
   1894 )

File c:\Users\xxxxx\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\backends\api.py:1219, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
...
    121     return arg

File <__array_function__ internals>:180, in where(*args, **kwargs)

MemoryError: Unable to allocate 19.2 GiB for an array with shape (210, 3500, 7000) and data type float32
```
# Environment
```python
windows 11 Home
xarray 2022.3.0
cfgrib 0.9.10.1
dask 2022.7.0
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7018/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue
1340474484,I_kwDOAMm_X85P5gR0,6920,Writing a netCDF file is slow,64621312,closed,1,,,3,2022-08-16T14:48:37Z,2022-08-16T17:05:37Z,2022-08-16T17:05:37Z,NONE,,,,"### What is your issue?

This has been discussed in [another thread](https://github.com/pydata/xarray/issues/2912), but the proposed solution there (first `.load()` the dataset into memory before running `to_netcdf`) does not work for me since my dataset is too large to fit into memory. The following code takes around 8 hours to run. You'll notice that I tried both `xr.open_mfdataset` and `xr.concat` in case it would make a difference, but it doesn't. I also tried profiling the code according to [this example](https://docs.dask.org/en/latest/diagnostics-local.html#example). The results are in this [html](https://www.dropbox.com/sh/42gzmne9a06qo8m/AAB6qqiFFQOScg8Ou4hH5GoZa?dl=0) (dropbox link) but I'm not really sure what I'm looking at.

Data: [dropbox link](https://www.dropbox.com/sh/onr9l7g7n254848/AAD9vkvWFg1FbinZ-EHHC7L2a?dl=0) to 717 netcdf files containing radar rainfall data for 6/28/2014 over the United States that is around 1GB in total.

Code:
```python
#%% Import libraries
import xarray as xr
from glob import glob
import pandas as pd
import time
import dask
dask.config.set(**{'array.slicing.split_large_chunks': False})

files =  glob(""data/*.nc"")
#%% functions
def extract_file_timestep(fname):
    fname = fname.split('/')[-1]
    fname = fname.split(""."")
    ftype = fname.pop(-1)
    fname = ''.join(fname)
    str_tstep = fname.split(""_"")[-1]
    if ftype == ""nc"":
        date_format = '%Y%m%d%H%M'
    if ftype == ""grib2"":
        date_format = '%Y%m%d-%H%M%S'

    tstep = pd.to_datetime(str_tstep, format=date_format)

    return tstep

def ds_preprocessing(ds):
    tstamp = extract_file_timestep(ds.encoding['source'])
    ds.coords[""time""] = tstamp
    ds = ds.expand_dims({""time"":1})
    ds = ds.rename({""lon"":""longitude"", ""lat"":""latitude"", ""mrms_a2m"":""rainrate""})
    ds = ds.chunk(chunks={""latitude"":3500, ""longitude"":7000, ""time"":1})
    return ds

#%% Loading and formatting data
lst_ds = []
start_time = time.time()
for f in files:
    ds = xr.open_dataset(f, chunks={""latitude"":3500, ""longitude"":7000})
    ds = ds_preprocessing(ds)
    lst_ds.append(ds)

ds_comb_frm_lst = xr.concat(lst_ds, dim=""time"")
print(""Time to load dataset using concat on list of datasets: {}"".format(time.time() - start_time))

start_time = time.time()
ds_comb_frm_open_mfdataset = xr.open_mfdataset(files, chunks={""latitude"":3500, ""longitude"":7000},
                                               concat_dim = ""time"", preprocess=ds_preprocessing, combine=""nested"")
print(""Time to load dataset using open_mfdataset: {}"".format(time.time() - start_time))
#%% exporting to netcdf
start_time = time.time()
ds_comb_frm_lst.to_netcdf(""ds_comb_frm_lst.nc"", encoding= {""rainrate"":{""zlib"":True}})
print(""Time to export dataset created using concat on list of datasets: {}"".format(time.time() - start_time))

start_time = time.time()
ds_comb_frm_open_mfdataset.to_netcdf(""ds_comb_frm_open_mfdataset.nc"", encoding= {""rainrate"":{""zlib"":True}})
print(""Time to export dataset created using open_mfdataset: {}"".format(time.time() - start_time))
```

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6920/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1332143835,I_kwDOAMm_X85PZubb,6892,2 Dimension Plot Producing Discontinuous Grid,64621312,closed,0,,,1,2022-08-08T16:59:14Z,2022-08-08T17:12:41Z,2022-08-08T17:11:44Z,NONE,,,,"### What is your issue?

**Problem:** I'm expecting a plot that looks like the one [here](https://docs.xarray.dev/en/stable/user-guide/plotting.html#id2) (Plotting-->Two Dimensions-->Simple Example) with a continuous grid, but instead I'm getting the plot below which has a discontinuous grid. This could be due to different spacing in the x and y dimensions (0.005 spacing in the `outlat` dimension and 0.00328768 spacing in the `outlon` dimension), but I don't know what to do about it.
    ![image](https://user-images.githubusercontent.com/64621312/183471078-e2a76231-1f5e-4b13-8ca5-511af22bf792.png)

**Data:** [Dropbox download link for 20 years of monthly rainfall totals covering Norfolk, VA in netcdf format (2.2MB)](https://www.dropbox.com/s/so61kkqosvru9q6/monthly_rainfall.nc?dl=0)

**Reprex:**
```python
import xarray as xr
ds= xr.open_dataset(""monthly_rainfall.nc"")
ds.rainrate.isel(time=100).plot()
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6892/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1308176241,I_kwDOAMm_X85N-S9x,6805,PermissionError: [Errno 13] Permission denied,64621312,closed,0,,,5,2022-07-18T16:05:31Z,2022-07-18T17:58:38Z,2022-07-18T17:58:38Z,NONE,,,,"### What is your issue?

This was raised about a year ago but still seems to be unresolved, so I'm hoping this will bring attention back to the issue. (https://github.com/pydata/xarray/issues/5488) 

**Data**: [dropbox sharing link](https://www.dropbox.com/sh/1jfwpzas0vfqd3o/AAAOaQsgjLBqYIc37ucshOMwa?dl=0)
**Description**: This folder contains 2 files each containing 1 day's worth of 1kmx1km gridded precipitation rate data from the National Severe Storms Laboratory. Each is about a gig (sorry they're so big, but it's what I'm working with!)
**Code**:
```python
import xarray as xr

f_in_ncs = ""data/""
f_in_nc = ""data/20190520.nc""

#%% works
ds = xr.open_dataset(f_in_nc, 
                    chunks={'outlat':3500, 'outlon':7000, 'time':50})
#%% doesn't work
mf_ds = xr.open_mfdataset(f_in_ncs,  concat_dim = ""time"",
            chunks={'outlat':3500, 'outlon':7000, 'time':50},
            combine = ""nested"", engine = 'netcdf4')
```
**Error**:
```Python
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?1f03b506-1f93-46ca-ad53-ff5a1ca1a767)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\file_manager.py:199, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    [198](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=197) try:
--> [199](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=198)     file = self._cache[self._key]
    [200](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=199) except KeyError:

File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\lru_cache.py:53, in LRUCache.__getitem__(self, key)
     [52](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=51) with self._lock:
---> [53](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=52)     value = self._cache[key]
     [54](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=53)     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('d:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

PermissionError                           Traceback (most recent call last)
Input In [4], in <cell line: 5>()
      1 import xarray as xr
      3 f_in_ncs = ""data/""
----> 5 ds = xr.open_mfdataset(f_in_ncs,  concat_dim = ""time"",
      6             chunks={'outlat':3500, 'outlon':7000, 'time':50},
      7             combine = ""nested"", engine = 'netcdf4')

File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\api.py:908, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
...
File src\netCDF4\_netCDF4.pyx:2307, in netCDF4._netCDF4.Dataset.__init__()

File src\netCDF4\_netCDF4.pyx:1925, in netCDF4._netCDF4._ensure_nc_success()

PermissionError: [Errno 13] Permission denied: b'd:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data'
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6805/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue