issues: 1581046647

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1581046647	I_kwDOAMm_X85ePNt3	7522	Differences in `to_netcdf` for dask and numpy backed arrays	39069044	open	0			7	2023-02-11T23:06:37Z	2023-03-01T23:12:11Z		CONTRIBUTOR				What is your issue? I make use of `fsspec` to quickly open netcdf files in the cloud and pull out slices of data without needing to read the entire file. Quick and dirty is just `ds = xr.open_dataset(fs.open("gs://..."))`. This works great, in that a many GB file can be lazy-loaded as a dataset in a few hundred milliseconds, by only parsing the netcdf headers with under-the-hood byte range requests. But, only if the netcdf is written from dask-backed arrays. Somehow, writing from numpy-backed arrays produces a different netcdf that requires reading deeper into the file to parse as a dataset. I spent some time digging into the backends and see xarray is ultimately passing off the store write to `dask.array` here. A look at `ncdump` and `Dataset.encoding` didn't reveal any obvious differences between these files, but there is clearly something. Anyone know why the straight xarray store methods would produce a different netcdf structure, despite the underlying data and encoding being identical? This should work as an MCVE: ```python import os import string import fsspec import numpy as np import xarray as xr fs = fsspec.filesystem("gs") bucket = "gs://<your-bucket>" create a ~160MB dataset with 20 variables variables = {v: (["x", "y"], np.random.random(size=(1000, 1000))) for v in string.ascii_letters[:20]} ds = xr.Dataset(variables) Save one version from numpy backed arrays and one from dask backed arrays ds.compute().to_netcdf("numpy.nc") ds.chunk().to_netcdf("dask.nc") Copy these to a bucket of your choice fs.put("numpy.nc", bucket) fs.put("dask.nc", bucket) ``` Then time reading in these files as datasets with fsspec: ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "numpy.nc"))) 2.15 s ± 40.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "dask.nc"))) 187 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7522/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }			13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
7 rows from issue in issue_comments