id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1083621690,I_kwDOAMm_X85AlsE6,6084,Initialise zarr metadata without computing dask graph,42455466,open,0,,,6,2021-12-17T21:17:42Z,2024-04-03T19:08:26Z,,NONE,,,,"**Is your feature request related to a problem? Please describe.**
On writing large zarr stores, the [xarray docs](https://xarray.pydata.org/en/stable/user-guide/io.html#appending-to-existing-zarr-stores) recommend first creating an initial Zarr store without writing all of its array data. The recommended approach is to first create a dummy dask-backed `Dataset`, and then call `to_zarr` with `compute=False` to write only metadata to Zarr. This works great.

It seems that in one common use case for this approach (including the example in the above docs), the entire dataset to be written to zarr is already represented in a `Dataset` (let's call this `ds`). Thus, rather than creating a dummy `Dataset` with exactly the same metadata as `ds`, it is more convenient to initialise the zarr Store with `ds.to_zarr(..., compute=False)`. See for example:

https://discourse.pangeo.io/t/many-netcdf-to-single-zarr-store-using-concurrent-futures/2029
https://discourse.pangeo.io/t/map-blocks-and-to-zarr-region/2019
https://discourse.pangeo.io/t/netcdf-to-zarr-best-practices/1119/12
https://discourse.pangeo.io/t/best-practice-for-memory-management-to-iteratively-write-a-large-dataset-with-xarray/1989

However, calling `to_zarr` with `compute=False` still computes the dask graph for writing the Zarr store. The graph is never used in this use-case, but computing the graph can take a really long time for large graphs. 

**Describe the solution you'd like**
Is there scope to add an option to `to_zarr` to initialise the store _without_ computing the dask graph? Or perhaps an `initialise_zarr` method would be cleaner?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6084/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1063046540,I_kwDOAMm_X84_XM2M,6026,Delaying open produces different type of `cftime` object,42455466,closed,0,,,3,2021-11-25T00:47:22Z,2022-01-13T13:49:27Z,2022-01-13T13:49:27Z,NONE,,,,"**What happened**:
The task is opening a dataset (e.g. a netcdf or zarr file) with a time coordinate using `use_cftime=True`. Delaying the task with dask results in the time coordinate being represented as `cftime.datetime` objects, whereas when the task is not delayed `cftime.Datetime<Calendar>` objects are used.

**What you expected to happen**:
Consistent `cftime` objects to be used, regardless of whether the opening task is delayed or not.

**Minimal Complete Verifiable Example**:

```python
import dask
import numpy as np
import xarray as xr
from dask.distributed import LocalCluster, Client

cluster = LocalCluster()
client = Client(cluster)

# Write some data
var = np.random.random(4)
time = xr.cftime_range('2000-01-01', periods=4, calendar='julian')
ds = xr.Dataset(data_vars={'var': ('time', var)},
                coords={'time': time})
ds.to_netcdf('test.nc', mode='w')

# Open written data
ds1 = xr.open_dataset('test.nc', use_cftime=True)
print(f'ds1: {ds1.time} \n')

# Delayed open written data
ds2 = dask.delayed(xr.open_dataset)('test.nc', use_cftime=True)
ds2 = dask.compute(ds2)[0]
print(f'ds2: {ds2.time} \n')

# Operations like xr.open_mfdataset which use dask.delayed internally 
# when parallel=True (I think) produce the same result as ds2
ds3 = xr.open_mfdataset('test.nc', use_cftime=True, parallel=True)
print(f'ds3: {ds3.time}')
```
returns
```
ds1: <xarray.DataArray 'time' (time: 4)>
array([cftime.DatetimeJulian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 2, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 3, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeJulian(2000, 1, 4, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 

ds2: <xarray.DataArray 'time' (time: 4)>
array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 

ds3: <xarray.DataArray 'time' (time: 4)>
array([cftime.datetime(2000, 1, 1, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 2, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 3, 0, 0, 0, 0, calendar='julian', has_year_zero=False),
       cftime.datetime(2000, 1, 4, 0, 0, 0, 0, calendar='julian', has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00
```

**Anything else we need to know?**:
I noticed this because the DatetimeAccessor `ceil`, `floor` and `round` methods return errors for `cftime.datetime` objects (but not `cftime.Datetime<Calendar>` objects) for all calendar types other than 'gregorian'. For example,
```python
ds3.time.dt.floor('D')
```
returns the following traceback:
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-613e63624953> in <module>
----> 1 ds3.time.dt.floor('D')

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in floor(self, freq)
    220         """"""
    221 
--> 222         return self._tslib_round_accessor(""floor"", freq)
    223 
    224     def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _tslib_round_accessor(self, name, freq)
    202     def _tslib_round_accessor(self, name, freq):
    203         obj_type = type(self._obj)
--> 204         result = _round_field(self._obj.data, name, freq)
    205         return obj_type(result, name=name, coords=self._obj.coords, dims=self._obj.dims)
    206 

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_field(values, name, freq)
    142         )
    143     else:
--> 144         return _round_through_series_or_index(values, name, freq)
    145 
    146 

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/accessor_dt.py in _round_through_series_or_index(values, name, freq)
    110         method = getattr(values_as_cftimeindex, name)
    111 
--> 112     field_values = method(freq=freq).values
    113 
    114     return field_values.reshape(values.shape)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in floor(self, freq)
    733         CFTimeIndex
    734         """"""
--> 735         return self._round_via_method(freq, _floor_int)
    736 
    737     def ceil(self, freq):

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _round_via_method(self, freq, method)
    714 
    715         unit = _total_microseconds(offset.as_timedelta())
--> 716         values = self.asi8
    717         rounded = method(values, unit)
    718         return _cftimeindex_from_i8(rounded, self.date_type, self.name)

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in asi8(self)
    684         epoch = self.date_type(1970, 1, 1)
    685         return np.array(
--> 686             [
    687                 _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    688                 for date in self.values

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in <listcomp>(.0)
    685         return np.array(
    686             [
--> 687                 _total_microseconds(exact_cftime_datetime_difference(epoch, date))
    688                 for date in self.values
    689             ],

/g/data/xv83/ds0092/software/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/resample_cftime.py in exact_cftime_datetime_difference(a, b)
    356     datetime.timedelta
    357     """"""
--> 358     seconds = b.replace(microsecond=0) - a.replace(microsecond=0)
    359     seconds = int(round(seconds.total_seconds()))
    360     microseconds = b.microsecond - a.microsecond

src/cftime/_cftime.pyx in cftime._cftime.datetime.__sub__()

TypeError: cannot compute the time difference between dates with different calendars
```
My apologies for conflating two issues here. I'm happy to open a separate issue for this if that's preferred.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-305.19.1.el8.nci.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.6.3
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: 2.9.5
cftime: 1.5.0
nc_time_axis: 1.4.0
PseudoNetCDF: None
rasterio: 1.2.4
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.11.2
distributed: 2021.11.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: 2021.05.0
cupy: None
pint: 0.18
sparse: None
setuptools: 49.6.0.post20210108
pip: 21.1.2
conda: 4.10.1
pytest: None
IPython: 7.24.0
sphinx: None
​
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6026/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
789755611,MDU6SXNzdWU3ODk3NTU2MTE=,4833,Strange behaviour when overwriting files with to_netcdf and html repr,42455466,closed,0,,,2,2021-01-20T08:28:35Z,2021-01-20T20:00:23Z,2021-01-20T20:00:23Z,NONE,,,,"**What happened**:

I'm experiencing some strange behaviour when overwriting netcdf files using `to_netcdf` in a Jupyter notebook. The issue is a bit quirky and convoluted and only seems to come about when using xarray's html repr in Jupyter. I've tried to find a reproducible example that demonstrates the issue (it's still quite convoluted, sorry):

I can generate some data, save it to a netcdf file, reopen it and everything works as expected:
```python
import numpy as np
import xarray as xr

ones = xr.DataArray(np.ones(5), coords=[range(5)], dims=['x']).to_dataset(name='a')

ones.to_netcdf('./a.nc')
print(xr.open_dataset('./a.nc')['a'])
```
```
<xarray.DataArray 'a' (x: 5)>
array([1., 1., 1., 1., 1.])
Coordinates:
  * x        (x) int64 0 1 2 3 4
```
I can overwrite `a.nc` with a modified dataset and everything still works as expected:
```python
twos = 2 * ones
twos.to_netcdf('./a.nc')
print(xr.open_dataset('./a.nc', cache=False)['a'])
```
```
<xarray.DataArray 'a' (x: 5)>
array([2., 2., 2., 2., 2.])
Coordinates:
  * x        (x) int64 0 1 2 3 4
```
I can run the above cell as many times as I like and always get the expected behaviour. However, if instead of `print`ing the `open_dataset` line, I allow it to be rendered by the xarray html repr, I find that the cell will run once and then will fail with a `Permission denied` error the second time it is run:
```python
twos.to_netcdf('./a.nc')
xr.open_dataset('./a.nc', cache=False)['a']
```
```
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
.../lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    198             try:
--> 199                 file = self._cache[self._key]
    200             except KeyError:

.../lib/python3.8/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     52         with self._lock:
---> 53             value = self._cache[key]
     54             self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('.../a.nc',), 'a', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:
.
.
.
PermissionError: [Errno 13] Permission denied: b'.../a.nc'
```
If I manually remove the file in question, I can resave it, but from then on xarray seems to have its wires crossed somehow and will present `twos` from `a.nc` regardless of what it actually contains:
```python
!rm ./a.nc
ones.to_netcdf('./a.nc')
print(xr.open_dataset('./a.nc')['a'])
```
```
<xarray.DataArray 'a' (x: 5)>
array([2., 2., 2., 2., 2.])
Coordinates:
  * x        (x) int64 0 1 2 3 4
```

Note that in the last example, the data saved on disk is correct (i.e. contains ones) but xarray is still somehow linked to the `twos` data

**Anything else we need to know?**:

I've come across this unexpected behaviour a few times. In the above example, I've had to add `cache=True` to consistently produce the behaviour, but in the past I've managed to produce these symptoms _without_ `cache=True` (I'm just not exactly sure how). Anecdotally, the behaviour always seems to occur after having rendered the xarray object in Jupyter using the html repr ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4833/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue