github: issues: 1 row where state = "open", type = "issue" and user = 64621312 sorted by updated

1 row where state = "open", type = "issue" and user = 64621312 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1368696980	I_kwDOAMm_X85RlKiU	7018	Writing netcdf after running xarray.dataset.reindex to fill gaps in a time series fails due to memory allocation error	lassiterdc 64621312	open	0			3	2022-09-10T18:21:48Z	2022-09-15T19:59:39Z		NONE				Problem Summary I am attempting to convert a.grib2 file representing a single day's worth of gridded radar rainfall data spanning the continental US, into a netcdf. When a .grib2 is missing timesteps, I am attempting to fill them in with NA values using `xarray.Dataset.reindex` before running `xarray.Dataset.to_netcdf`. However, after I've reindexed the dataset, the script fails due to a memory allocation error. It succeeds if I don't reindex. One clue could be in the fact that the dataset chunks are set to `(70, 3500, 7000)`, but when `ds.to_netcdf` is called, the script fails because it's attempting to load a chunk with dimensions `(210, 3500, 7000)`. Accessing Full Reproducible Example The code and data to reproduce my results can be downloaded from this Dropbox link. The code is also shown below followed by the outputs. Potentially relevant OS and environment information are shown below as well. Code ```python %% Import libraries import time start_time = time.time() import xarray as xr import cfgrib from glob import glob import pandas as pd import dask dask.config.set(*{'array.slicing.split_large_chunks': False}) # to silence warnings of loading large slice into memory dask.config.set(scheduler='synchronous') # this forces single threaded computations (netcdfs can only be written serially) %% parameters chnk_sz = "7000MB" fl_out_nc = "out_netcdfs/20010101.nc" fldr_in_grib = "in_gribs/20010101.grib2" %% loading and exporting dataset ds = xr.open_dataset(fldr_in_grib, engine="cfgrib", chunks={"time":chnk_sz}, backend_kwargs={'indexpath': ''}) reindex start_date = pd.to_datetime('2001-01-01') tstep = pd.Timedelta('0 days 00:05:00') new_index = pd.date_range(start=start_date, end=start_date + pd.Timedelta(1, "day"),\ freq=tstep, inclusive='left') ds = ds.reindex(indexers={"time":new_index}) ds = ds.unify_chunks() ds = ds.chunk(chunks={'time':chnk_sz}) print("######## INSPECTING DATASET PRIOR TO WRITING TO NETCDF ########") print(ds) print(' ') print("######## ERROR MESSAGE ########") ds.to_netcdf(fl_out_nc, encoding= {"unknown":{"zlib":True}}) ``` Outputs ``` ## INSPECTING DATASET PRIOR TO WRITING TO NETCDF <xarray.Dataset> Dimensions: (time: 288, latitude: 3500, longitude: 7000) Coordinates: time (time) datetime64[ns] 2001-01-01 ... 2001-01-01T23:55:00 * latitude (latitude) float64 54.99 54.98 54.98 54.97 ... 20.03 20.02 20.01 * longitude (longitude) float64 230.0 230.0 230.0 ... 300.0 300.0 300.0 step timedelta64[ns] ... surface float64 ... valid_time (time) datetime64[ns] dask.array<chunksize=(288,), meta=np.ndarray> Data variables: unknown (time, latitude, longitude) float32 dask.array<chunksize=(70, 3500, 7000), meta=np.ndarray> Attributes: GRIB_edition: 2 GRIB_centre: 161 GRIB_centreDescription: 161 GRIB_subCentre: 0 Conventions: CF-1.7 institution: 161 history: 2022-09-10T14:50 GRIB to CDM+CF via cfgrib-0.9.1... ## ERROR MESSAGE Output exceeds the size limit. Open the full output data in a text editor MemoryError Traceback (most recent call last) d:\Dropbox_Sharing\reprex\2022-9-9_writing_ncdf_fails\reprex\exporting_netcdfs_reduced.py in <cell line: 22>() 160 print(' ') 161 print("######## ERROR MESSAGE ########") ---> 162 ds.to_netcdf(fl_out_nc, encoding= {"unknown":{"zlib":True}}) File c:\Users\Daniel\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\core\dataset.py:1882, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1879 encoding = {} 1880 from ..backends.api import to_netcdf -> 1882 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 1883 self, 1884 path, 1885 mode=mode, 1886 format=format, 1887 group=group, 1888 engine=engine, 1889 encoding=encoding, 1890 unlimited_dims=unlimited_dims, 1891 compute=compute, 1892 multifile=False, 1893 invalid_netcdf=invalid_netcdf, 1894 ) File c:\Users\xxxxx\anaconda3\envs\weather_gen_3\lib\site-packages\xarray\backends\api.py:1219, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) ... 121 return arg File <array_function internals>:180, in where(args, *kwargs) MemoryError: Unable to allocate 19.2 GiB for an array with shape (210, 3500, 7000) and data type float32 ``` Environment `python windows 11 Home xarray 2022.3.0 cfgrib 0.9.10.1 dask 2022.7.0`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7018/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		reopened	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where state = "open", type = "issue" and user = 64621312 sorted by updated_at descending

Problem Summary

Accessing Full Reproducible Example

Code

%% Import libraries

%% parameters

%% loading and exporting dataset

reindex

Outputs

## INSPECTING DATASET PRIOR TO WRITING TO NETCDF

## ERROR MESSAGE

Output exceeds the size limit. Open the full output data in a text editor

Environment

Advanced export