issue_comments: 1282452427

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/7181#issuecomment-1282452427	https://api.github.com/repos/pydata/xarray/issues/7181	1282452427	IC_kwDOAMm_X85McKvL	1486942	2022-10-18T14:03:40Z	2022-10-18T14:03:40Z	CONTRIBUTOR	Our tests all pass, but there is something like a x20 slowdown, and it's basically entirely due to copies. It's plausible we're doing way too many copies as it is, but this is obviously still concerning. I tried adding the following asv benchmark, based off the `combine` benchmark and the deepcopy tests: ```py class Copy: def setup(self): """Create 4 datasets with two different variables""" `t_size, x_size, y_size = 50, 450, 400 t = np.arange(t_size) data = np.random.randn(t_size, x_size, y_size) self.ds = xr.Dataset( {"A": xr.DataArray(data, coords={"T": t}, dims=("T", "X", "Y"))} ) def time_copy(self) -> None: copy(self.ds) def time_deepcopy(self) -> None: deepcopy(self.ds) def time_copy_method(self) -> None: self.ds.copy(deep=False) def time_copy_method_deep(self) -> None: self.ds.copy(deep=True)` ``` But I didn't see any regressions between `v2022.06.0..HEAD`, so that simplistic case is clearly not enough. There are a few differences between our test datasets and the one in the benchmark above: - 4D vs 3D - smaller grid: (2, 5, 4, 3) vs (50, 450, 400) - more variables: ~20 vs 1 - variable attributes vs none - multiple files read in via `open_mfdataset` vs entirely in-memory	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1412895383