issues: 2149485914

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
2149485914	I_kwDOAMm_X86AHo1a	8778	Stricter defaults for concat, combine, open_mfdataset	2448579	open	0			2	2024-02-22T16:43:38Z	2024-02-23T04:17:40Z		MEMBER				Is your feature request related to a problem? The defaults for `concat` are excessively permissive: `data_vars="all", coords="different", compat="no_conflicts", join="outer"`. This comment illustrates why this can be hard to predict or understand: a seemingly unrelated option `decode_cf` controls whether a variable is in `data_vars` or `coords`, and can result in wildly different concatenation behaviour. This always concatenates data_vars along `concat_dim` even if they did not have that dimension to begin with. If the same coordinate var exists in different datasets/files, they will be sequentially compared for equality to decide whether they get concatenated. The outer join (applied along all dimensions that are not `concat_dim`) can result in very large datasets due to small floating points differences in the indexes, and also questionable behaviour with staggered grid datasets. "no_conflicts" basically picks the first not-NaN value after aligning all datasets, but is quite slow (we should be using `duck_array_ops.nanfirst` here I think). While "convenient" this really just makes the default experience quite bad with hard-to-understand slowdowns. Describe the solution you'd like I propose we migrate to `data_vars="minimal", coords="minimal", join="exact", compat="override"`. This should 1. only concatenate `data_vars` and `coords` variables when they already have `concat_dim`. 2. For any variables that do not have `concat_dim`, it will blindly pick them from the first file. 3. `join="exact"` will prevent ballooning of dimension sizes due to floating point inequalities. 4. These options will totally avoid any data reads unless explicitly requested by the user. Unfortunately, this has a pretty big blast radius so we'd need a long deprecation cycle. Describe alternatives you've considered No response Additional context xref https://github.com/pydata/xarray/issues/4824 xref https://github.com/pydata/xarray/issues/1385 xref https://github.com/pydata/xarray/issues/8231 xref https://github.com/pydata/xarray/issues/5381 xref https://github.com/pydata/xarray/issues/2064 xref https://github.com/pydata/xarray/issues/2217	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8778/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
0 rows from issue in issue_comments