issue_comments: 1460907454

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/7574#issuecomment-1460907454	https://api.github.com/repos/pydata/xarray/issues/7574	1460907454	IC_kwDOAMm_X85XE62-	127195910	2023-03-08T21:34:49Z	2023-03-15T16:54:13Z	NONE	@jonas-constellr It's possible that the failure you're experiencing is due to an issue with how the h5netcdf library is interacting with Dask. One potential solution to this issue is to try using the netCDF4 library instead of h5netcdf. netCDF4 is another popular library for reading and writing netCDF files, and it has built-in support for parallel I/O through Dask. To use netCDF4 with xarray, you can simply pass the 'netcdf4' engine to the xr.open_mfdataset function: python import xarray as xr Open multiple netCDF files with netCDF4 engine and parallel I/O ds = xr.open_mfdataset('path/to/files/*.nc', engine='netcdf4', parallel=True) If you need to use h5netcdf for some reason, another potential solution is to use the dask.array.from_delayed function to manually create a Dask array from the h5netcdf data. This can be done by first reading in the data using h5netcdf, and then using dask.delayed to parallelize the data loading across multiple chunks. Here's an example: python import h5netcdf import dask.array as da from dask import delayed Define function to read in a single chunk of data from the netCDF file @delayed def read_chunk(filename, varname, start, count): with h5netcdf.File(filename, 'r') as f: var = f[varname][start[0]:start[0]+count[0], start[1]:start[1]+count[1]] return var Define function to read in the entire dataset using dask.array.from_delayed def read_data(files, varname): chunks = (1000, 1000) # Define chunk size data = [read_chunk(f, varname, start, chunks) for f in files] data = [da.from_delayed(d, shape=chunks, dtype='float64') for d in data] data = da.concatenate(data, axis=0) return data Open multiple netCDF files with h5netcdf engine and parallel I/O files = ['path/to/files/file1.nc', 'path/to/files/file2.nc', ...] varname = 'my_variable' data = read_data(files, varname) This code reads in the data from each file in chunks, and returns a Dask array that is a concatenation of all the chunks. The read_chunk function uses h5netcdf.File to read in a single chunk of data from a file, and returns a delayed object that represents the loading of that chunk. The read_data function uses dask.delayed to parallelize the loading of the chunks across all the files, and then uses dask.array.from_delayed to create a Dask array from the delayed objects. Finally, the function returns the concatenated Dask array.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1605108888