id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1340994913,I_kwDOAMm_X85P7fVh,6924,Memory Leakage Issue When Running to_netcdf ,64621312,closed,0,,,2,2022-08-16T23:58:17Z,2023-01-17T18:38:40Z,2023-01-17T18:38:40Z,NONE,,,,"### What is your issue? I have a zarr file that I'd like to convert to a netcdf which is too large to fit in memory. My computer has 32GB of RAM so writing ~5.5GB chunks shouldn't be a problem. However, within seconds of running this script, my memory usage quickly tops out consuming the available ~20GB and the script fails. Data: [Dropbox link](https://www.dropbox.com/sh/xmcz93p53n1w3ft/AACjI9EskzwKsA8sp-WmM2BFa?dl=0) to zarr file containing radar rainfall data for 6/28/2014 over the United States that is around 1.8GB in total. Code: ```python import xarray as xr import zarr fpath_zarr = ""out_zarr_20140628.zarr"" ds_from_zarr = xr.open_zarr(store=fpath_zarr, chunks={'outlat':3500, 'outlon':7000, 'time':30}) ds_from_zarr.to_netcdf(""ds_zarr_to_nc.nc"", encoding= {""rainrate"":{""zlib"":True}}) ``` Outputs: ```python MemoryError: Unable to allocate 5.48 GiB for an array with shape (30, 3500, 7000) and data type float64 ``` Package versions: ``` dask 2022.7.0 xarray 2022.3.0 zarr 2.8.1 ``` ![memory_screenshot](https://user-images.githubusercontent.com/64621312/185004542-7c91bcbc-7e7b-4656-a306-732bc1d2e9c3.jpg) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6924/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1340474484,I_kwDOAMm_X85P5gR0,6920,Writing a netCDF file is slow,64621312,closed,1,,,3,2022-08-16T14:48:37Z,2022-08-16T17:05:37Z,2022-08-16T17:05:37Z,NONE,,,,"### What is your issue? This has been discussed in [another thread](https://github.com/pydata/xarray/issues/2912), but the proposed solution there (first `.load()` the dataset into memory before running `to_netcdf`) does not work for me since my dataset is too large to fit into memory. The following code takes around 8 hours to run. You'll notice that I tried both `xr.open_mfdataset` and `xr.concat` in case it would make a difference, but it doesn't. I also tried profiling the code according to [this example](https://docs.dask.org/en/latest/diagnostics-local.html#example). The results are in this [html](https://www.dropbox.com/sh/42gzmne9a06qo8m/AAB6qqiFFQOScg8Ou4hH5GoZa?dl=0) (dropbox link) but I'm not really sure what I'm looking at. Data: [dropbox link](https://www.dropbox.com/sh/onr9l7g7n254848/AAD9vkvWFg1FbinZ-EHHC7L2a?dl=0) to 717 netcdf files containing radar rainfall data for 6/28/2014 over the United States that is around 1GB in total. Code: ```python #%% Import libraries import xarray as xr from glob import glob import pandas as pd import time import dask dask.config.set(**{'array.slicing.split_large_chunks': False}) files = glob(""data/*.nc"") #%% functions def extract_file_timestep(fname): fname = fname.split('/')[-1] fname = fname.split(""."") ftype = fname.pop(-1) fname = ''.join(fname) str_tstep = fname.split(""_"")[-1] if ftype == ""nc"": date_format = '%Y%m%d%H%M' if ftype == ""grib2"": date_format = '%Y%m%d-%H%M%S' tstep = pd.to_datetime(str_tstep, format=date_format) return tstep def ds_preprocessing(ds): tstamp = extract_file_timestep(ds.encoding['source']) ds.coords[""time""] = tstamp ds = ds.expand_dims({""time"":1}) ds = ds.rename({""lon"":""longitude"", ""lat"":""latitude"", ""mrms_a2m"":""rainrate""}) ds = ds.chunk(chunks={""latitude"":3500, ""longitude"":7000, ""time"":1}) return ds #%% Loading and formatting data lst_ds = [] start_time = time.time() for f in files: ds = xr.open_dataset(f, chunks={""latitude"":3500, ""longitude"":7000}) ds = ds_preprocessing(ds) lst_ds.append(ds) ds_comb_frm_lst = xr.concat(lst_ds, dim=""time"") print(""Time to load dataset using concat on list of datasets: {}"".format(time.time() - start_time)) start_time = time.time() ds_comb_frm_open_mfdataset = xr.open_mfdataset(files, chunks={""latitude"":3500, ""longitude"":7000}, concat_dim = ""time"", preprocess=ds_preprocessing, combine=""nested"") print(""Time to load dataset using open_mfdataset: {}"".format(time.time() - start_time)) #%% exporting to netcdf start_time = time.time() ds_comb_frm_lst.to_netcdf(""ds_comb_frm_lst.nc"", encoding= {""rainrate"":{""zlib"":True}}) print(""Time to export dataset created using concat on list of datasets: {}"".format(time.time() - start_time)) start_time = time.time() ds_comb_frm_open_mfdataset.to_netcdf(""ds_comb_frm_open_mfdataset.nc"", encoding= {""rainrate"":{""zlib"":True}}) print(""Time to export dataset created using open_mfdataset: {}"".format(time.time() - start_time)) ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6920/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1332143835,I_kwDOAMm_X85PZubb,6892,2 Dimension Plot Producing Discontinuous Grid,64621312,closed,0,,,1,2022-08-08T16:59:14Z,2022-08-08T17:12:41Z,2022-08-08T17:11:44Z,NONE,,,,"### What is your issue? **Problem:** I'm expecting a plot that looks like the one [here](https://docs.xarray.dev/en/stable/user-guide/plotting.html#id2) (Plotting-->Two Dimensions-->Simple Example) with a continuous grid, but instead I'm getting the plot below which has a discontinuous grid. This could be due to different spacing in the x and y dimensions (0.005 spacing in the `outlat` dimension and 0.00328768 spacing in the `outlon` dimension), but I don't know what to do about it. ![image](https://user-images.githubusercontent.com/64621312/183471078-e2a76231-1f5e-4b13-8ca5-511af22bf792.png) **Data:** [Dropbox download link for 20 years of monthly rainfall totals covering Norfolk, VA in netcdf format (2.2MB)](https://www.dropbox.com/s/so61kkqosvru9q6/monthly_rainfall.nc?dl=0) **Reprex:** ```python import xarray as xr ds= xr.open_dataset(""monthly_rainfall.nc"") ds.rainrate.isel(time=100).plot() ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6892/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1308176241,I_kwDOAMm_X85N-S9x,6805,PermissionError: [Errno 13] Permission denied,64621312,closed,0,,,5,2022-07-18T16:05:31Z,2022-07-18T17:58:38Z,2022-07-18T17:58:38Z,NONE,,,,"### What is your issue? This was raised about a year ago but still seems to be unresolved, so I'm hoping this will bring attention back to the issue. (https://github.com/pydata/xarray/issues/5488) **Data**: [dropbox sharing link](https://www.dropbox.com/sh/1jfwpzas0vfqd3o/AAAOaQsgjLBqYIc37ucshOMwa?dl=0) **Description**: This folder contains 2 files each containing 1 day's worth of 1kmx1km gridded precipitation rate data from the National Severe Storms Laboratory. Each is about a gig (sorry they're so big, but it's what I'm working with!) **Code**: ```python import xarray as xr f_in_ncs = ""data/"" f_in_nc = ""data/20190520.nc"" #%% works ds = xr.open_dataset(f_in_nc, chunks={'outlat':3500, 'outlon':7000, 'time':50}) #%% doesn't work mf_ds = xr.open_mfdataset(f_in_ncs, concat_dim = ""time"", chunks={'outlat':3500, 'outlon':7000, 'time':50}, combine = ""nested"", engine = 'netcdf4') ``` **Error**: ```Python Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?1f03b506-1f93-46ca-ad53-ff5a1ca1a767) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\file_manager.py:199, in CachingFileManager._acquire_with_cache_info(self, needs_lock) [198](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=197) try: --> [199](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=198) file = self._cache[self._key] [200](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=199) except KeyError: File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\lru_cache.py:53, in LRUCache.__getitem__(self, key) [52](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=51) with self._lock: ---> [53](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=52) value = self._cache[key] [54](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=53) self._cache.move_to_end(key) KeyError: [, ('d:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))] During handling of the above exception, another exception occurred: PermissionError Traceback (most recent call last) Input In [4], in () 1 import xarray as xr 3 f_in_ncs = ""data/"" ----> 5 ds = xr.open_mfdataset(f_in_ncs, concat_dim = ""time"", 6 chunks={'outlat':3500, 'outlon':7000, 'time':50}, 7 combine = ""nested"", engine = 'netcdf4') File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\api.py:908, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs) ... File src\netCDF4\_netCDF4.pyx:2307, in netCDF4._netCDF4.Dataset.__init__() File src\netCDF4\_netCDF4.pyx:1925, in netCDF4._netCDF4._ensure_nc_success() PermissionError: [Errno 13] Permission denied: b'd:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data' ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6805/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue