issue_comments: 408860643

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2329#issuecomment-408860643	https://api.github.com/repos/pydata/xarray/issues/2329	408860643	MDEyOklzc3VlQ29tbWVudDQwODg2MDY0Mw==	1197350	2018-07-30T13:20:59Z	2018-07-30T13:20:59Z	MEMBER	@lrntct - this sounds like a reasonable way to use zarr. We routinely do this sort of transcoding and it works reasonable well. Unfortunately something clearly isn't working right in your case. These things can be hard to debug, but we will try to help you. You might want to start by reviewing the guide I wrote for Pangeo on preparing zarr datasets. It would also be good to see a bit more detail. You posted a function `netcdf2zarr` that converts a single netcdf file to a single zarr file. How are you invoking that function? Are you trying to create one zarr store for each netCDF file? How many netCDF files are there? If there are many (e.g. one per timmestep), my recommendation is to create only one zarr store for the whole dataset. Open the netcdf files using `open_mfdataset`. If instead you have just one big netCDF file as in the example you posted above, I think I see you problem: you are calling `.chunk()` after calling `open_dataset()`, rather calling `open_dataset(nc_path, chunks=chunks)`. This probably means that you are loading the whole dataset in a single task and then re-chunking. That could be the source of the inefficiency. More ideas: - explicitly specify the chunks (rather than using `'auto'`) - eliminate the negative number in your chunk sizes - make sure you really need `clevel=9` Another useful piece of advice would be to use the dask distributed dashboard to monitor what is happening under the hood. You can do this by running `python from dask.distributed import Client client = Client() client` In a notebook, this should provide you a link to the scheduler dashboard. Once you call `ds.to_zarr()`, watch the task stream in the dashboard to see what is happening. Hopefully these ideas can help you move forward.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		345715825