id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1966264258,I_kwDOAMm_X851Ms_C,8385,The method to_netcdf does not preserve chunks,40218891,open,0,,,3,2023-10-27T22:29:45Z,2023-10-31T18:51:45Z,,NONE,,,,"### What happened?

Methods ``to_zarr`` and ``to_netcdf`` behave inconsistently for chunked dataset. The latter does not preserve existing chunk information, the chunks must be specified within the ``encoding`` dictionary.

### What did you expect to happen?

I expected the behaviour to be consistent for for all ``to_XXX()`` methods.

### Minimal Complete Verifiable Example

```Python
import xarray as xr
import dask.array as da

rng = da.random.RandomState()
shape = (20, 20)
chunks = [10, 10]
dims = [""x"", ""y""]
z = rng.standard_normal(shape, chunks=chunks)
ds = xr.DataArray(z, dims=dims, name=""z"").to_dataset()
ds.chunks
# This one is rechunked
ds.to_netcdf(""/tmp/test1.nc"", encoding={""z"": {""chunksizes"": (5, 5)}})
# This one is not rechunked, also original chunks are lost
ds.chunk({""x"": 5, ""y"": 5}).to_netcdf(""/tmp/test2.nc"")
# This one is rechunked
ds.chunk({""x"": 5, ""y"": 5}).to_zarr(""/tmp/test2"", mode=""w"")

Frozen({'x': (10, 10), 'y': (10, 10)})
<xarray.backends.zarr.ZarrStore at 0x7f3669f1af80>

xr.open_mfdataset(""/tmp/test1.nc"").chunks
xr.open_mfdataset(""/tmp/test2.nc"").chunks
xr.open_mfdataset(""/tmp/test2"", engine=""zarr"").chunks

Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)})
Frozen({'x': (20,), 'y': (20,)})
Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)})
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

_No response_

### Anything else we need to know?

I did get the same results for ``h5netcdf`` and ``scipy`` backends, so I am not sure whether this is a bug or not. 
The above code is a modified version of #2198.
A suggestion: the documentation provides only examples of encoding styles. It would be helpful to provide links to a full specification.


### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.10.0
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: 0.5.1
fsspec: 2023.10.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8385/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1953059418,I_kwDOAMm_X850aVJa,8345,`.stack` produces large chunks,40218891,closed,0,,,4,2023-10-19T21:09:56Z,2023-10-26T21:20:05Z,2023-10-26T21:20:05Z,NONE,,,,"### What happened?

Xarray ``stack``  does not chunk along the last coordinate, producing huge chunks, as described in #5754. Dask, seeing code like this:
```
da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
```
produces warning and suggestion to use context manager:
```
with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
       da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
```
This fails with message ``IndexError: tuple index out of range``.

### What did you expect to happen?

I expect this to work. #5754 is closed. 


### Minimal Complete Verifiable Example

```Python
import dask.array
import numpy as np

import xarray as xr

var = xr.Variable(
    (""t"", ""z"", ""u"", ""x"", ""y""),
    dask.array.random.random((1200, 4, 2, 1000, 100), chunks=(1, 1, -1, -1, -1)),
)
da = xr.DataArray(var)

def sum(ds):
    return ds.sum(dim=""u"")

with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
    da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
da2
```


### MVCE confirmation

- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

```Python
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[21], line 5
      2     return ds.sum(dim=""u"")
      4 with dask.config.set(**{""array.slicing.split_large_chunks"": True}):
----> 5     da2 = da.stack(new=(""z"", ""t"")).groupby(""new"").map(sum).unstack(""new"")
      6 da2

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataarray.py:2855, in DataArray.unstack(self, dim, fill_value, sparse)
   2795 def unstack(
   2796     self,
   2797     dim: Dims = None,
   2798     fill_value: Any = dtypes.NA,
   2799     sparse: bool = False,
   2800 ) -> Self:
   2801     """"""
   2802     Unstack existing dimensions corresponding to MultiIndexes into
   2803     multiple new dimensions.
   (...)
   2853     DataArray.stack
   2854     """"""
-> 2855     ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
   2856     return self._from_temp_dataset(ds)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5500, in Dataset.unstack(self, dim, fill_value, sparse)
   5498 for d in dims:
   5499     if needs_full_reindex:
-> 5500         result = result._unstack_full_reindex(
   5501             d, stacked_indexes[d], fill_value, sparse
   5502         )
   5503     else:
   5504         result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:5395, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
   5393 if name not in index_vars:
   5394     if dim in var.dims:
-> 5395         variables[name] = var.unstack({dim: new_dim_sizes})
   5396     else:
   5397         variables[name] = var

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1930, in Variable.unstack(self, dimensions, **dimensions_kwargs)
   1928 result = self
   1929 for old_dim, dims in dimensions.items():
-> 1930     result = result._unstack_once_full(dims, old_dim)
   1931 return result

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/variable.py:1820, in Variable._unstack_once_full(self, dims, old_dim)
   1817 reordered = self.transpose(*dim_order)
   1819 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes
-> 1820 new_data = reordered.data.reshape(new_shape)
   1821 new_dims = reordered.dims[: len(other_dims)] + new_dim_names
   1823 return type(self)(
   1824     new_dims, new_data, self._attrs, self._encoding, fastpath=True
   1825 )

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:2219, in Array.reshape(self, merge_chunks, limit, *shape)
   2217 if len(shape) == 1 and not isinstance(shape[0], Number):
   2218     shape = shape[0]
-> 2219 return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/reshape.py:285, in reshape(x, shape, merge_chunks, limit)
    283             else:
    284                 chunk_plan.append(""auto"")
--> 285         outchunks = normalize_chunks(
    286             chunk_plan,
    287             shape=shape,
    288             limit=limit,
    289             dtype=x.dtype,
    290             previous_chunks=inchunks,
    291         )
    293 x2 = x.rechunk(inchunks)
    295 # Construct graph

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3095, in normalize_chunks(chunks, shape, limit, dtype, previous_chunks)
   3092 chunks = tuple(""auto"" if isinstance(c, str) and c != ""auto"" else c for c in chunks)
   3094 if any(c == ""auto"" for c in chunks):
-> 3095     chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   3097 if shape is not None:
   3098     chunks = tuple(c if c not in {None, -1} else s for c, s in zip(chunks, shape))

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in auto_chunks(chunks, shape, limit, dtype, previous_chunks)
   3212 largest_block = math.prod(
   3213     cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto""
   3214 )
   3216 if previous_chunks:
   3217     # Base ideal ratio on the median chunk size of the previous chunks
-> 3218     result = {a: np.median(previous_chunks[a]) for a in autos}
   3220     ideal_shape = []
   3221     for i, s in enumerate(shape):

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/dask/array/core.py:3218, in <dictcomp>(.0)
   3212 largest_block = math.prod(
   3213     cs if isinstance(cs, Number) else max(cs) for cs in chunks if cs != ""auto""
   3214 )
   3216 if previous_chunks:
   3217     # Base ideal ratio on the median chunk size of the previous chunks
-> 3218     result = {a: np.median(previous_chunks[a]) for a in autos}
   3220     ideal_shape = []
   3221     for i, s in enumerate(shape):

IndexError: tuple index out of range
```


### Anything else we need to know?

The most recent traceback entry point to an issue in dask code.

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8345/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1956383344,I_kwDOAMm_X850nApw,8358,Writing to zarr archive fails on resampled dataset,40218891,closed,0,,,1,2023-10-23T05:30:36Z,2023-10-23T15:46:20Z,2023-10-23T15:46:19Z,NONE,,,,"### What happened?

I am not sure where this belongs: xarray, dask or zarr. When a dataset is resampled to a semi-monthly frequency, the method ``to_zarr`` complains about invalid chunks.

### What did you expect to happen?

I think this should work without having to rechunk the result before writing to the archive.

### Minimal Complete Verifiable Example

```Python
time = pd.date_range(""2001-01-01"", freq=""D"", periods=365)
ds = xr.Dataset({""foo"": (""time"", np.arange(1, 366)), ""time"": time}).chunk(time=5)
dsr = ds.resample(time=""SM"").mean()
dsr.to_zarr('/tmp/foo', mode='w')
```


### MVCE confirmation

- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

```Python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[63], line 4
      2 ds = xr.Dataset({""foo"": (""time"", np.arange(1, 366)), ""time"": time}).chunk(time=5)
      3 dsr = ds.resample(time=""SM"").mean()
----> 4 dsr.to_zarr('/tmp/foo', mode='w')
      5 #dsr.isel(time=slice(0, -1)).to_zarr('/tmp/foo', mode='w')

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/core/dataset.py:2490, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   2358 """"""Write dataset contents to a zarr group.
   2359 
   2360 Zarr chunks are determined in the following way:
   (...)
   2486     The I/O user guide, with more details and examples.
   2487 """"""
   2488 from xarray.backends.api import to_zarr
-> 2490 return to_zarr(  # type: ignore[call-overload,misc]
   2491     self,
   2492     store=store,
   2493     chunk_store=chunk_store,
   2494     storage_options=storage_options,
   2495     mode=mode,
   2496     synchronizer=synchronizer,
   2497     group=group,
   2498     encoding=encoding,
   2499     compute=compute,
   2500     consolidated=consolidated,
   2501     append_dim=append_dim,
   2502     region=region,
   2503     safe_chunks=safe_chunks,
   2504     zarr_version=zarr_version,
   2505     write_empty_chunks=write_empty_chunks,
   2506     chunkmanager_store_kwargs=chunkmanager_store_kwargs,
   2507 )

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1708, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   1706 writer = ArrayWriter()
   1707 # TODO: figure out how to properly handle unlimited_dims
-> 1708 dump_to_store(dataset, zstore, writer, encoding=encoding)
   1709 writes = writer.sync(
   1710     compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
   1711 )
   1713 if compute:

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/api.py:1308, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1305 if encoder:
   1306     variables, attrs = encoder(variables, attrs)
-> 1308 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:631, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    628     self.set_attributes(attributes)
    629     self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims)
--> 631 self.set_variables(
    632     variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims
    633 )
    634 if self._consolidate_on_close:
    635     zarr.consolidate_metadata(self.zarr_group.store)

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:687, in ZarrStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    684         zarr_array = self.zarr_group[name]
    685 else:
    686     # new variable
--> 687     encoding = extract_zarr_variable_encoding(
    688         v, raise_on_invalid=check, name=vn, safe_chunks=self._safe_chunks
    689     )
    690     encoded_attrs = {}
    691     # the magic for storing the hidden dimension data

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:281, in extract_zarr_variable_encoding(variable, raise_on_invalid, name, safe_chunks)
    278         if k not in valid_encodings:
    279             del encoding[k]
--> 281 chunks = _determine_zarr_chunks(
    282     encoding.get(""chunks""), variable.chunks, variable.ndim, name, safe_chunks
    283 )
    284 encoding[""chunks""] = chunks
    285 return encoding

File ~/mambaforge/envs/icec/lib/python3.11/site-packages/xarray/backends/zarr.py:138, in _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name, safe_chunks)
    132     raise ValueError(
    133         ""Zarr requires uniform chunk sizes except for final chunk. ""
    134         f""Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. ""
    135         ""Consider rechunking using `chunk()`.""
    136     )
    137 if any((chunks[0] < chunks[-1]) for chunks in var_chunks):
--> 138     raise ValueError(
    139         ""Final chunk of Zarr array must be the same size or smaller ""
    140         f""than the first. Variable named {name!r} has incompatible Dask chunks {var_chunks!r}.""
    141         ""Consider either rechunking using `chunk()` or instead deleting ""
    142         ""or modifying `encoding['chunks']`.""
    143     )
    144 # return the first chunk for each dimension
    145 return tuple(chunk[0] for chunk in var_chunks)

ValueError: Final chunk of Zarr array must be the same size or smaller than the first. Variable named 'foo' has incompatible Dask chunks ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2),).Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`.
```


### Anything else we need to know?

I can also achieve what I want without having to rechunk with
```
dsr = ds.resample(time=""SM"", closed=""right"", label=""right"").mean().isel(time=slice(0, -1))
```


### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.5-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: 0.5.1
fsspec: 2023.10.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8358/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1940650207,I_kwDOAMm_X85zq_jf,8300,Inconsistent behaviour of xarray.concat,40218891,closed,0,,,2,2023-10-12T19:23:32Z,2023-10-12T19:48:01Z,2023-10-12T19:48:00Z,NONE,,,,"### What is your issue?

I am not sure if it is a bug or a feature:
```
import numpy as np
import pandas as pd
import xarray as xr

temp = 15 + 8 * np.random.randn(2, 2, 2)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset(
    {""temperature"": ([""x"", ""y"", ""time""], temp), ""latitude_longitude"": 0},
    coords={
        ""lon"": ([""x"", ""y""], lon),
        ""lat"": ([""x"", ""y""], lat),
        ""time"": (""time"", pd.date_range(""2014-09-05"", periods=2)),
    },
)
print(
    xr.concat(
        [ds.isel(time=0), ds.isel(time=1)], ""time"", data_vars=""minimal""
    ).latitude_longitude
)
print(
    xr.concat(
        [ds.isel(time=slice(0, 1)), ds.isel(time=slice(1, 2))], ""time"", data_vars=""minimal""
    ).latitude_longitude
)
```
I expected the output to be the same. It appears that ``data_vars=""minimal""`` has no effect when the concatenation dimension does not exist.
```
<xarray.DataArray 'latitude_longitude' (time: 2)>
array([0, 0])
Coordinates:
  * time     (time) datetime64[ns] 2014-09-05 2014-09-06
<xarray.DataArray 'latitude_longitude' ()>
array(0)
```
The documentation states:
```
These data variables will be concatenated together:

        “minimal”: Only data variables in which the dimension already appears are included.
```
BTW, this is xarray 2023.9.0.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8300/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1806386948,I_kwDOAMm_X85rq0cE,7990,Random crashes in netcdf when dask client has multiple threads,40218891,closed,0,,,1,2023-07-16T01:00:55Z,2023-08-23T00:18:18Z,2023-08-23T00:18:17Z,NONE,,,,"### What happened?

The data files can be found here: https://noaadata.apps.nsidc.org/NOAA/G02202_V4/north/monthly/.  The example code below crashes randomly: the file processed when the crash occurs differs between runs. This happens only when ``threads_per_worker`` is > 1 in the ``client()`` call . ``n_workers`` does not matter, at least I could not make it to crash.  The traceback points to hdf5.


### What did you expect to happen?

_No response_

### Minimal Complete Verifiable Example

```Python
from pathlib import Path

import pandas as pd
from dask.distributed import Client

import xarray as xr

client = Client(n_workers=1, threads_per_worker=4)

DATADIR = Path(""/mnt/sdc1/icec/NSIDC"")
year = 2020

times = pd.date_range(f""{year}-01-01"", f""{year}-12-01"", freq=""MS"", name=""time"")
paths = [
    DATADIR / ""monthly"" / f""seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc""
    for t in times
]
for n in range(10):
    ds = xr.open_mfdataset(
        paths,
        combine=""nested"",
        concat_dim=""tdim"",
        parallel=True,
        engine=""netcdf4"",
    )
    del ds

HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0:
  #000: H5G.c line 442 in H5Gopen2(): unable to synchronously open group
    major: Symbol table
    minor: Unable to create file
  #001: H5G.c line 399 in H5G__open_api_common(): can't set object access arguments
    major: Symbol table
    minor: Can't set value
  #002: H5VLint.c line 2669 in H5VL_setup_acc_args(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #003: H5VLint.c line 1787 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.0) thread 0:
  #000: H5G.c line 887 in H5Gclose(): not a group ID
    major: Invalid arguments to routine
    minor: Inappropriate type
2023-07-16 00:35:47,833 - distributed.worker - WARNING - Compute Failed
Key:       open_dataset-09a155bb-5079-406a-83c4-737933c409c7
Function:  execute_task
args:      ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202001_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]])))
kwargs:    {}
Exception: ""OSError(-101, 'NetCDF: HDF error')""

2023-07-16 00:35:47,834 - distributed.worker - WARNING - Compute Failed
Key:       open_dataset-14e239f4-7e16-4891-a350-b55979d4a754
Function:  execute_task
args:      ((<function apply at 0x7f0001edf520>, <function open_dataset at 0x7effe3e35c60>, ['/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc'], (<class 'dict'>, [['engine', 'netcdf4'], ['chunks', (<class 'dict'>, [])]])))
kwargs:    {}
Exception: ""OSError(-101, 'NetCDF: HDF error')""

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[1], line 19
     14 paths = [
     15     DATADIR / ""monthly"" / f""seaice_conc_monthly_nh_{t.strftime('%Y%m')}_f17_v04r00.nc""
     16     for t in times
     17 ]
     18 for n in range(10):
---> 19     ds = xr.open_mfdataset(
     20         paths,
     21         combine=""nested"",
     22         concat_dim=""tdim"",
     23         parallel=True,
     24         engine=""netcdf4"",
     25     )
     26     del ds

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:1050, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
   1045     datasets = [preprocess(ds) for ds in datasets]
   1047 if parallel:
   1048     # calling compute here will return the datasets/file_objs lists,
   1049     # the underlying datasets will still be stored as dask arrays
-> 1050     datasets, closers = dask.compute(datasets, closers)
   1052 # Combine all datasets, closing them in case of a ValueError
   1053 try:

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/api.py:570, in open_dataset()
    558 decoders = _resolve_decoders_kwargs(
    559     decode_cf,
    560     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    566     decode_coords=decode_coords,
    567 )
    569 overwrite_encoded_chunks = kwargs.pop(""overwrite_encoded_chunks"", None)
--> 570 backend_ds = backend.open_dataset(
    571     filename_or_obj,
    572     drop_variables=drop_variables,
    573     **decoders,
    574     **kwargs,
    575 )
    576 ds = _dataset_from_backend_dataset(
    577     backend_ds,
    578     filename_or_obj,
   (...)
    588     **kwargs,
    589 )
    590 return ds

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:590, in open_dataset()
    569 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    570     self,
    571     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    587     autoclose=False,
    588 ) -> Dataset:
    589     filename_or_obj = _normalize_path(filename_or_obj)
--> 590     store = NetCDF4DataStore.open(
    591         filename_or_obj,
    592         mode=mode,
    593         format=format,
    594         group=group,
    595         clobber=clobber,
    596         diskless=diskless,
    597         persist=persist,
    598         lock=lock,
    599         autoclose=autoclose,
    600     )
    602     store_entrypoint = StoreBackendEntrypoint()
    603     with close_on_error(store):

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:391, in open()
    385 kwargs = dict(
    386     clobber=clobber, diskless=diskless, persist=persist, format=format
    387 )
    388 manager = CachingFileManager(
    389     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    390 )
--> 391 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:338, in __init__()
    336 self._group = group
    337 self._mode = mode
--> 338 self.format = self.ds.data_model
    339 self._filename = self.ds.filepath()
    340 self.is_remote = is_remote_uri(self._filename)

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:400, in ds()
    398 @property
    399 def ds(self):
--> 400     return self._acquire()

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:394, in _acquire()
    393 def _acquire(self, needs_lock=True):
--> 394     with self._manager.acquire_context(needs_lock) as root:
    395         ds = _nc4_require_group(root, self._group, self._mode)
    396     return ds

File ~/mambaforge/envs/icec/lib/python3.10/contextlib.py:135, in __enter__()
    133 del self.args, self.kwds, self.func
    134 try:
--> 135     return next(self.gen)
    136 except StopIteration:
    137     raise RuntimeError(""generator didn't yield"") from None

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:199, in acquire_context()
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """"""Context manager for acquiring a file.""""""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File ~/mambaforge/envs/icec/lib/python3.10/site-packages/xarray/backends/file_manager.py:217, in _acquire_with_cache_info()
    215     kwargs = kwargs.copy()
    216     kwargs[""mode""] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == ""w"":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = ""a""

File src/netCDF4/_netCDF4.pyx:2464, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -101] NetCDF: HDF error: '/mnt/sdc1/icec/NSIDC/monthly/seaice_conc_monthly_nh_202011_f17_v04r00.nc'
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.1.38-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.0
libnetcdf: 4.9.2

xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.7.0
distributed: 2023.7.0
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: 0.14.0
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7990/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
787947436,MDU6SXNzdWU3ODc5NDc0MzY=,4822,h5netcdf fails to decode attribute coordinates.,40218891,closed,0,,,10,2021-01-18T06:01:40Z,2022-03-29T13:39:46Z,2022-03-29T13:39:45Z,NONE,,,,"<!-- Please include a self-contained copy-pastable example that generates the issue if possible.

Please be concise with code posted. See guidelines below on how to provide a good bug report:

- Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
- Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve

Bug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly.
-->

**What happened**:
The engine ``h5netcdf`` fail to decode attribute *coordinates*.

**What you expected to happen**:
It should work.


**Minimal Complete Verifiable Example**:

```python
# Put your MCVE code here
import xarray as xr
ds = xr.open_dataset('/tmp/x.nc', engine='h5netcdf')

========H5 coordinates ['x y']

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-481117dce7ff> in <module>
      1 import xarray as xr
      2 
----> 3 ds = xr.open_dataset('/tmp/x.nc', engine='h5netcdf')

~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
    572 
    573     with close_on_error(store):
--> 574         ds = maybe_decode_store(store, chunks)
    575 
    576     # Ensure source filename always stored in dataset object (GH issue #2550)

~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/backends/api.py in maybe_decode_store(store, chunks)
    476             drop_variables=drop_variables,
    477             use_cftime=use_cftime,
--> 478             decode_timedelta=decode_timedelta,
    479         )
    480 

~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    596         drop_variables=drop_variables,
    597         use_cftime=use_cftime,
--> 598         decode_timedelta=decode_timedelta,
    599     )
    600     ds = Dataset(vars, attrs=attrs)

~/miniconda3/envs/aws/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    504             if ""coordinates"" in var_attrs:
    505                 coord_str = var_attrs[""coordinates""]
--> 506                 var_coord_names = coord_str.split()
    507                 if all(k in variables for k in var_coord_names):
    508                     new_vars[k].encoding[""coordinates""] = coord_str

AttributeError: 'numpy.ndarray' object has no attribute 'split'
```

**Anything else we need to know?**:
The test file was created from CDL:
```
netcdf x {
dimensions:
	x = 1 ;
	y = 1 ;
variables:
	int foo(y, x) ;
		string foo:coordinates = ""x y"" ;
data:

 foo =
  0 ;
}
```
The line ``========H5 coordinates ['x y']`` comes from me adding print statement on line 56 in function *_read_attributes*, file *api/h5netcdf.py*. Obviously the problem is caused by the attribute being a list instead of a string, as it is when *netcdf4* is used.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

<!-- Paste the output here xr.show_versions() here -->
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, Mar 23 2020, 22:36:06) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.9.12-200.fc33.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.16.2
pandas: 1.2.0
numpy: 1.19.2
scipy: 1.5.2
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.01.0
distributed: 2021.01.0
matplotlib: 3.2.1
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: None
setuptools: 51.1.2.post20210112
pip: 20.3.3
conda: None
pytest: 5.4.3
IPython: 7.19.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4822/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
863477424,MDU6SXNzdWU4NjM0Nzc0MjQ=,5199,"Better error message for setting encoding[""units""] ",40218891,closed,0,,,0,2021-04-21T05:57:14Z,2021-05-13T18:27:13Z,2021-05-13T18:27:13Z,NONE,,,,"<!-- Please include a self-contained copy-pastable example that generates the issue if possible

-->

**What happened**:

Setting invalid units for time axis encoding results in an exception ``AttributeError: 'NoneType' object has no attribute 'groups'``

**What you expected to happen**:

It should say ""invalid time units"", like this (see commented out line below) ``ValueError: invalid time units: days after 1/3/2000``

**Minimal Complete Verifiable Example**:

```python
# Put your MCVE code here
import pandas as pd
import xarray as xr
ds = xr.Dataset(data_vars={'v': (('t',), [0,])},
                coords={'t': [pd.Timestamp(2000, 1, 1)]})
ds.t.encoding['units'] = 'days since Big Bang'
#ds.t.encoding['units'] = 'days after 1/3/2000'
ds.to_netcdf('/tmp/x.nc', mode='w')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-0d97a797be64> in <module>
      5 ds.t.encoding['units'] = 'days since Big Bang'
      6 #ds.t.encoding['units'] = 'days after 1/3/2000'
----> 7 ds.to_netcdf('/tmp/x.nc', mode='w')

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1752         from ..backends.api import to_netcdf
   1753 
-> 1754         return to_netcdf(
   1755             self,
   1756             path,

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1066         # TODO: allow this work (setting up the file for writing array data)
   1067         # to be parallelized with dask
-> 1068         dump_to_store(
   1069             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1070         )

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1113         variables, attrs = encoder(variables, attrs)
   1114 
-> 1115     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1116 
   1117 

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    261             writer = ArrayWriter()
    262 
--> 263         variables, attributes = self.encode(variables, attributes)
    264 
    265         self.set_attributes(attributes)

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/backends/common.py in encode(self, variables, attributes)
    350         # All NetCDF files get CF encoded by default, without this attempting
    351         # to write times, for example, would fail.
--> 352         variables, attributes = cf_encoder(variables, attributes)
    353         variables = {k: self.encode_variable(v) for k, v in variables.items()}
    354         attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in cf_encoder(variables, attributes)
    841     _update_bounds_encoding(variables)
    842 
--> 843     new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    844 
    845     # Remove attrs from bounds variables (issue #2921)

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in <dictcomp>(.0)
    841     _update_bounds_encoding(variables)
    842 
--> 843     new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    844 
    845     # Remove attrs from bounds variables (issue #2921)

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/conventions.py in encode_cf_variable(var, needs_copy, name)
    267         variables.UnsignedIntegerCoder(),
    268     ]:
--> 269         var = coder.encode(var, name=name)
    270 
    271     # TODO(shoyer): convert all of these to use coders, too:

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in encode(self, variable, name)
    510             variable
    511         ):
--> 512             (data, units, calendar) = encode_cf_datetime(
    513                 data, encoding.pop(""units"", None), encoding.pop(""calendar"", None)
    514             )

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in encode_cf_datetime(dates, units, calendar)
    448         units = infer_datetime_units(dates)
    449     else:
--> 450         units = _cleanup_netcdf_time_units(units)
    451 
    452     if calendar is None:

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _cleanup_netcdf_time_units(units)
    399 
    400 def _cleanup_netcdf_time_units(units):
--> 401     delta, ref_date = _unpack_netcdf_time_units(units)
    402     try:
    403         units = ""{} since {}"".format(delta, format_timestamp(ref_date))

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _unpack_netcdf_time_units(units)
    130 
    131     delta_units, ref_date = [s.strip() for s in matches.groups()]
--> 132     ref_date = _ensure_padded_year(ref_date)
    133 
    134     return delta_units, ref_date

~/miniconda3/envs/xarray/lib/python3.9/site-packages/xarray-0.0.0-py3.9.egg/xarray/coding/times.py in _ensure_padded_year(ref_date)
    105     # appropriately
    106     matches_start_digits = re.match(r""(\d+)(.*)"", ref_date)
--> 107     ref_year, everything_else = [s for s in matches_start_digits.groups()]
    108     ref_date_padded = ""{:04d}{}"".format(int(ref_year), everything_else)
    109 

AttributeError: 'NoneType' object has no attribute 'groups'
```
**Anything else we need to know?**:
Are there  detail specifications for the valid units string? Setting it to 1/3/2000 surprised me (my locale: LANG=en_CA.UTF-8).
**Environment**:
Latest code from github: xarray version 0.0.0
<details><summary>Output of <tt>xr.show_versions()</tt></summary>
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.11-200.fc33.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.8.0

xarray: 0.0.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.7.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.04.0
distributed: 2021.04.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: None
<!-- Paste the output here xr.show_versions() here -->


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5199/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
849751721,MDU6SXNzdWU4NDk3NTE3MjE=,5106,to_zarr() fails on time coordinate in append mode,40218891,closed,0,,,4,2021-04-03T22:26:11Z,2021-04-20T12:04:06Z,2021-04-20T04:41:07Z,NONE,,,,"<!-- Please include a self-contained copy-pastable example that generates the issue if possible.

Please be concise with code posted. See guidelines below on how to provide a good bug report:

- Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
- Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve

Bug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly.
-->

**What happened**:
When the append dimension coordinates are times and the dimension of the first dataset written is 1, consecutive appends forget the hour part of the coordinate.
**What you expected to happen**:
The time coordinate should be set correctly.

**Minimal Complete Verifiable Example**:

```
import pandas as pd
import xarray as xr
reftime = [pd.Timestamp(2021, 2, 21, 0)]
x = [0]
dims = ('reftime', 'x')
d = np.array([['A']])
ds1 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds1.to_zarr('foo', mode='w')
reftime = [pd.Timestamp(2021, 2, 21, 6)]
d = np.array([['C']])
ds2 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds2.to_zarr('foo', append_dim='reftime')
ds = xr.open_dataset('foo', engine='zarr')
ds.coords['reftime'].values

array(['2021-02-21T00:00:00.000000000', '2021-02-21T00:00:00.000000000'],  # should be 2021-02-21T06:00:00.000000000
      dtype='datetime64[ns]')
```

**Anything else we need to know?**:
When the `reftime` coordinate in the first dataset has dimension 2, the output is correct:
```
import pandas as pd
import xarray as xr
reftime = [pd.Timestamp(2021, 2, 21, 0), pd.Timestamp(2021, 2, 21, 3)]
x = [0]
dims = ('reftime', 'x')
d = np.array([['A'], ['B']])
ds1 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds1.to_zarr('foo', mode='w')
reftime = [pd.Timestamp(2021, 2, 21, 6)]
d = np.array([['C']])
ds2 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds2.to_zarr('foo', append_dim='reftime')
ds = xr.open_dataset('foo', engine='zarr')
ds.coords['reftime'].values

array(['2021-02-21T00:00:00.000000000', '2021-02-21T03:00:00.000000000',
       '2021-02-21T06:00:00.000000000'], dtype='datetime64[ns]')
```
Increment of a full day works fine:
```
import pandas as pd
import xarray as xr
reftime = [pd.Timestamp(2021, 2, 21, 0)]
x = [0]
dims = ('reftime', 'x')
d = np.array([['A']])
ds1 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds1.to_zarr('foo', mode='w')
reftime = [pd.Timestamp(2021, 2, 22, 0)]
d = np.array([['C']])
ds2 = xr.Dataset(data_vars={'v': (dims, d)}, coords={'reftime': reftime, 'x': x})
_ = ds2.to_zarr('foo', append_dim='reftime')
ds = xr.open_dataset('foo', engine='zarr')
ds.coords['reftime'].values

array(['2021-02-21T00:00:00.000000000', '2021-02-22T00:00:00.000000000'],
      dtype='datetime64[ns]')
```
**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

<!-- Paste the output here xr.show_versions() here -->

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.11-200.fc33.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.2
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.10.0
h5py: 3.1.0
Nio: None
zarr: 2.7.0
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.04.0
distributed: 2021.04.0
matplotlib: 3.4.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: None
IPython: 7.22.0
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5106/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
789653499,MDU6SXNzdWU3ODk2NTM0OTk=,4830,GH2550 revisited,40218891,open,0,,,2,2021-01-20T05:40:16Z,2021-01-25T23:06:01Z,,NONE,,,,"<!-- Please do a quick search of existing issues to make sure that this has not been asked before. -->

**Is your feature request related to a problem? Please describe.**
I am retrieving files from AWS: https://registry.opendata.aws/wrf-se-alaska-snap/. An example:
```
import s3fs
import xarray as xr

s3 = s3fs.S3FileSystem(anon=True)
s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-0[12].nc'
remote_files = s3.glob(s3path)
fileset = [s3.open(file) for file in remote_files]

ds = xr.open_mfdataset(fileset, concat_dim='Time', decode_cf=False)
ds
```
Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the *source* attribute is available only when the fileset consists of strings or *Path*s. 

**Describe the solution you'd like**
I would suggest to return to the original suggestion in #2550 - pass *filename_or_object* as an argument to *preprocess* function, but with necessary inspection. Here is my attempt (code in *open_mfdataset*):
```
open_kwargs = dict(
        engine=engine, chunks=chunks or {}, lock=lock, autoclose=autoclose, **kwargs
    )

    if preprocess is not None:
        # Get number of free arguments
        from inspect import signature
        parms = signature(preprocess).parameters
        num_preprocess_args = len([p for p in parms.values() if p.default == p.empty])
        if num_preprocess_args not in (1, 2):
            raise ValueError('preprocess accepts only 1 or 2 arguments')

    if parallel:
        import dask

        # wrap the open_dataset, getattr, and preprocess with delayed
        open_ = dask.delayed(open_dataset)
        getattr_ = dask.delayed(getattr)
        if preprocess is not None:
            preprocess = dask.delayed(preprocess)
    else:
        open_ = open_dataset
        getattr_ = getattr

    datasets = [open_(p, **open_kwargs) for p in paths]
    file_objs = [getattr_(ds, ""_file_obj"") for ds in datasets]
    if preprocess is not None:
        if num_preprocess_args == 1:
            datasets = [preprocess(ds) for ds in datasets]
        else:
            datasets = [preprocess(ds, p) for (ds, p) in zip(datasets, paths)]
```
With this, I can define function *fix* as follows:
```
def fix(ds, source):
    vtime = datetime.strptime(os.path.basename(source.path), 'WRFDS_%Y-%m-%d.nc')
    return ds.assign_coords(Time=[vtime])

ds = xr.open_mfdataset(fileset, preprocess=fix, concat_dim='Time', decode_cf=False)
```
This is backward compatible, *preprocess* can accept any number of arguments:
```
from functools import partial
import xarray as xr

def fix1(ds):
    print('fix1')
    return ds

def fix2(ds, file):
    print('fix2:', file.as_uri())
    return ds

def fix3(ds, file, arg):
    print('fix3:', file.as_uri(), arg)
    return ds

fileset = [Path('/home/george/Downloads/WRFDS_1988-04-23.nc'),
           Path('/home/george/Downloads/WRFDS_1988-04-24.nc')
          ]
ds = xr.open_mfdataset(fileset, preprocess=fix1, concat_dim='Time', parallel=True)
ds = xr.open_mfdataset(fileset, preprocess=fix2, concat_dim='Time')
ds = xr.open_mfdataset(fileset, preprocess=partial(fix3, arg='additional argument'),
                       concat_dim='Time')
```
```
fix1
fix1
fix2: file:///home/george/Downloads/WRFDS_1988-04-23.nc
fix2: file:///home/george/Downloads/WRFDS_1988-04-24.nc
fix3: file:///home/george/Downloads/WRFDS_1988-04-23.nc additional argument
fix3: file:///home/george/Downloads/WRFDS_1988-04-24.nc additional argument
```
**Describe alternatives you've considered**
The simple solution would be to make xarray s3fs aware. IMHO this is not particularly elegant. Either a check for an attribute, or an import within a *try/except* block would be needed.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4830/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
429914958,MDU6SXNzdWU0Mjk5MTQ5NTg=,2871,xr.open_dataset(f1).to_netcdf(file2) is not idempotent,40218891,closed,0,,,5,2019-04-05T20:06:35Z,2019-06-12T15:32:27Z,2019-06-12T15:32:27Z,NONE,,,,"Here is the original (much truncated) file.
```
> ncdump ak.nc
netcdf ak {
dimensions:
        npts = UNLIMITED ; // (2 currently)
        ntimes = 4 ;
variables:
        short tmpk(npts, ntimes) ;
                tmpk:description = ""2 m Temperature - closest to top of hour"" ;
                tmpk:units = ""K"" ;
                tmpk:level = ""2 m"" ;
                tmpk:period_variable = ""ntimes1"" ;
                tmpk:missing_value = -9999s ;
                tmpk:scale_factor = 0.01 ;

// global attributes:
                :source = ""ak-obs"" ;
data:

 tmpk =
  26915, 27755, -9999, 27705,
  25595, -9999, 28315, -9999 ;
}
```
Python code:
```
ds = xr.open_dataset('ak.nc')
ds.to_netcdf('akbad.nc')
ds
ds['tmpk']

<xarray.Dataset>
Dimensions:  (npts: 2, ntimes: 4)
Dimensions without coordinates: npts, ntimes
Data variables:
    tmpk     (npts, ntimes) float32 ...
Attributes:
    source:   ak-obs

<xarray.DataArray 'tmpk' (npts: 2, ntimes: 4)>
array([[269.15, 277.55,    nan, 277.05],
       [255.95,    nan, 283.15,    nan]], dtype=float32)
Dimensions without coordinates: npts, ntimes
Attributes:
    description:      2 m Temperature - closest to top of hour
    units:            K
    level:            2 m
    period_variable:  ntimes1
```
File written to disk:
```
> ncdump akbad.nc
netcdf akbad {
dimensions:
        npts = UNLIMITED ; // (2 currently)
        ntimes = 4 ;
variables:
        short tmpk(npts, ntimes) ;
                tmpk:description = ""2 m Temperature - closest to top of hour"" ;
                tmpk:units = ""K"" ;
                tmpk:level = ""2 m"" ;
                tmpk:period_variable = ""ntimes1"" ;
                tmpk:scale_factor = 0.01 ;

// global attributes:
                :source = ""ak-obs"" ;
data:

 tmpk =
  26915, 27755, 0, 27705,
  25595, 0, 28315, 0 ;
}
```
To confuse matter more, I am getting a warning:
```
SerializationWarning: saving variable tmpk with floating point data as an integer dtype without any _FillValue to use for NaNs
```
This might make sense, since `tmpk` after decoding has datatype `float32`, however somehow the original variable type and `scale_factor` are preserved, but `missing_value` attribute disappears and value written back is wrong: 0. 
I want to write back `tmpk` as float number and let zlib worry about disk space.

xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.1 | packaged by conda-forge | (default, Feb 18 2019, 01:42:00) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.5.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.0
pandas: 0.24.2
numpy: 1.15.4
scipy: 1.2.1
netCDF4: 1.5.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 1.1.4
distributed: 1.26.0
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 40.8.0
pip: 19.0.3
conda: 4.6.10
pytest: 4.3.1
IPython: 7.4.0
sphinx: 1.8.5

​","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2871/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue