issue_comments: 1352310432

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/7368#issuecomment-1352310432	https://api.github.com/repos/pydata/xarray/issues/7368	1352310432	IC_kwDOAMm_X85Qmp6g	4160723	2022-12-14T22:33:23Z	2022-12-15T01:08:41Z	MEMBER	I did some profiling to find the cause of the decrease in performance reported in the benchmarks (dataset creation). In summary, this is explained by a `Coordinates` object (built from the `coords` mapping) that is now included in objects to align when merging data vars and coordinates. Previously all non DataArray objects in the `coords` mapping were excluded from alignment (in `deep_align`). The introduced overhead comes from a call to `Coordinates._reindex_callback()`, which (I think?) should do no more than shallow copies and/or xarray wrapping stuff. In the benchmark report this is only marked as significant when creating small datasets (1.5-2x slower), and it becomes insignificant for datasets with more data variables. Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting `Coordinates` is pretty clean and future proof IMO (assuming that we'll further refactor `Coordinates` to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now? More details about the new workflow implemented in this PR when creating a new Dataset: if Dataset's `coords` argument is a "simple" mapping, it is first internally converted into a `Coordinates` object, with the creation of default indexes for dimension coordinates if one or more DataArray objects are given in `coords`, their coordinates (variables + indexes) are extracted and merged with the other input coordinates see the implementation in `xarray.core.coordinates.create_coords_with_default_indexes` otherwise, just reuse the `Coordinates` object passed as `coords` coordinates are then merged with data variables the `Coordinates` object is aligned with every other "alignable" object found in `data_vars` coordinate indexes (if any) are passed explicitly to `align` so they are used in priority explicitly using a `Coordinates` object skips the creation of default indexes during merging (in `collect_variables_and_indexes()`)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		1485037066