id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
289342234,MDU6SXNzdWUyODkzNDIyMzQ=,1836,HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler,102827,open,0,,,5,2018-01-17T17:05:56Z,2022-06-21T14:50:02Z,,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible

```python
import xarray as xr
import numpy as np
import dask.multiprocessing

# Generate dummy data and build xarray dataset
mat = np.random.rand(10, 90, 90)
ds = xr.Dataset(data_vars={'foo': (('time', 'x', 'y'), mat)})

# Write dataset to netcdf without compression
ds.to_netcdf('dummy_data_3d.nc')
# Write with zlib compersison
ds.to_netcdf('dummy_data_3d_with_compression.nc', 
             encoding={'foo': {'zlib': True}})
# Write data as int16 with scale factor applied
ds.to_netcdf('dummy_data_3d_with_scale_factor.nc', 
             encoding={'foo': {'dtype': 'int16',
                               'scale_factor': 0.01,
                               '_FillValue': -9999}})

# Load data from netCDF files
ds_vanilla = xr.open_dataset('dummy_data_3d.nc', chunks={'time': 1})
ds_scaled = xr.open_dataset('dummy_data_3d_with_scale_factor.nc', chunks={'time': 1})
ds_compressed = xr.open_dataset('dummy_data_3d_with_compression.nc', chunks={'time': 1})

# Do computation using dask's multiprocessing scheduler
foo = ds_vanilla.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get)
foo = ds_scaled.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get)
foo = ds_compressed.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get)
# The last line fails

```

#### Problem description

If NetCDF files are compressed (which is often the case) and opened with chunking enabled to use them with dask, computations using the multiprocessing scheduler fail. The above code shows this in a short example. The last line fails with a long HDF5 error log:
<details>

```
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 140736213758912:
  #000: H5Dio.c line 171 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 544 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 2022 in H5D__chunk_read(): error looking up chunk address
    major: Dataset
    minor: Can't get value
  #003: H5Dchunk.c line 2768 in H5D__chunk_lookup(): can't query chunk address
    major: Dataset
    minor: Can't get value
  #004: H5Dbtree.c line 1047 in H5D__btree_idx_get_addr(): can't get chunk info
    major: Dataset
    minor: Can't get value
  #005: H5B.c line 341 in H5B_find(): unable to load B-tree node
    major: B-Tree node
    minor: Unable to protect metadata
  #006: H5AC.c line 1763 in H5AC_protect(): H5C_protect() failed
    major: Object cache
    minor: Unable to protect metadata
  #007: H5C.c line 2561 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #008: H5C.c line 6877 in H5C_load_entry(): Can't deserialize image
    major: Object cache
    minor: Unable to load metadata into cache
  #009: H5Bcache.c line 181 in H5B__cache_deserialize(): wrong B-tree signature
    major: B-Tree node
    minor: Bad value
Traceback (most recent call last):
  File ""hdf5_bug_minimal_working_example.py"", line 27, in <module>
    foo = ds_compressed.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataarray.py"", line 658, in compute
    return new.load(**kwargs)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataarray.py"", line 632, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataset.py"", line 491, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/base.py"", line 333, in compute
    results = get(dsk, keys, **kwargs)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/multiprocessing.py"", line 177, in get
    raise_exception=reraise, **kwargs)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 521, in get_async
    raise_exception(exc, tb)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 290, in execute_task
    result = _execute_task(task, data)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 270, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 270, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 267, in _execute_task
    return [_execute_task(a, cache) for a in arg]
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py"", line 271, in _execute_task
    return func(*args2)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/array/core.py"", line 72, in getter
    c = np.asarray(c)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py"", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/indexing.py"", line 538, in __array__
    return np.asarray(self.array, dtype=dtype)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py"", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/indexing.py"", line 505, in __array__
    return np.asarray(array[self.key], dtype=None)
  File ""/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/backends/netCDF4_.py"", line 61, in __getitem__
    data = getitem(self.get_array(), key)
  File ""netCDF4/_netCDF4.pyx"", line 3961, in netCDF4._netCDF4.Variable.__getitem__
  File ""netCDF4/_netCDF4.pyx"", line 4798, in netCDF4._netCDF4.Variable._get
  File ""netCDF4/_netCDF4.pyx"", line 1638, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error
```

</details>

A possible workaround, if the dataset fits into memory, is to use
```python
ds = ds.persist()
```
I could split up my dataset to accomplish this, but the beauty of xarray and dask gets lost a little when doing this...

#### Output of ``xr.show_versions()``

<details>

```
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.16.0
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.1
setuptools: 36.7.2
pip: 9.0.1
conda: 4.3.29
pytest: 3.2.5
IPython: 5.5.0
sphinx: None
```

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1836/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
374279704,MDU6SXNzdWUzNzQyNzk3MDQ=,2514,interpolate_na with limit argument changes size of chunks,102827,closed,0,,,8,2018-10-26T08:31:35Z,2021-03-26T19:50:50Z,2021-03-26T19:50:50Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible

```python
import pandas as pd
import xarray as xr
import numpy as np

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
foo = np.sin(np.arange(len(t)))
bar = np.cos(np.arange(len(t)))

foo[1] = np.NaN
bar[2] = np.NaN

ds_test = xr.Dataset(data_vars={'foo': ('time', foo),
                           'bar': ('time', bar)},
                    coords={'time': t}).chunk()

print(ds_test)
print(""\n\n### After `.interpolate_na(dim='time')`\n"")
print(ds_test.interpolate_na(dim='time'))
print(""\n\n### After `.interpolate_na(dim='time', limit=5)`\n"")
print(ds_test.interpolate_na(dim='time', limit=5))
print(""\n\n### After `.interpolate_na(dim='time', limit=20)`\n"")
print(ds_test.interpolate_na(dim='time', limit=20))
```

Output of the above code. Note the different chunk sizes, depending on the value of `limit`:
```
<xarray.Dataset>
Dimensions:  (time: 745)
Coordinates:
  * time     (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
    foo      (time) float64 dask.array<shape=(745,), chunksize=(745,)>
    bar      (time) float64 dask.array<shape=(745,), chunksize=(745,)>


### After `.interpolate_na(dim='time')`

<xarray.Dataset>
Dimensions:  (time: 745)
Coordinates:
  * time     (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
    foo      (time) float64 dask.array<shape=(745,), chunksize=(745,)>
    bar      (time) float64 dask.array<shape=(745,), chunksize=(745,)>


### After `.interpolate_na(dim='time', limit=5)`

<xarray.Dataset>
Dimensions:  (time: 745)
Coordinates:
  * time     (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
    foo      (time) float64 dask.array<shape=(745,), chunksize=(3,)>
    bar      (time) float64 dask.array<shape=(745,), chunksize=(3,)>


### After `.interpolate_na(dim='time', limit=20)`

<xarray.Dataset>
Dimensions:  (time: 745)
Coordinates:
  * time     (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
    foo      (time) float64 dask.array<shape=(745,), chunksize=(10,)>
    bar      (time) float64 dask.array<shape=(745,), chunksize=(10,)>
```

#### Problem description

When using `xarray.DataArray.interpolate_na()` with the `limit` kwarg this changes the chunksize of the resulting `dask.arrays`.

#### Expected Output

The chunksize should not change. Very small chunks which results from typical small values of `limit` are not optimal for the performance of `dask`. Also, things like `.rolling()` will fail if the chunksize is smaller than the window length of the rolling window.

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2514/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
376162232,MDExOlB1bGxSZXF1ZXN0MjI3NDQzNTI3,2532,[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array,102827,closed,0,,,2,2018-10-31T21:12:03Z,2021-03-26T19:50:50Z,2021-03-26T19:50:50Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2532," - [ ] Closes #2514
 - [ ] Closes #2531
 - [ ] Tests added (for all bug fixes or enhancements)
 - [ ] Fully documented, including `whats-new.rst` for all changes

## Short summary
The two rolling-window functions for `dask.array`
 * [dask_rolling_wrapper](https://github.com/pydata/xarray/blob/b622c5e7da928524ef949d9e389f6c7f38644494/xarray/core/dask_array_ops.py#L23)
 * [rolling_window](https://github.com/pydata/xarray/blob/b622c5e7da928524ef949d9e389f6c7f38644494/xarray/core/dask_array_ops.py#L43)

will be fixed to preserve `dask.array` chunksizes.

## Long summary

The specific initial problem with chunksizes and `interpolate_na()` in #2514 is caused by the padding done in

https://github.com/pydata/xarray/blob/5940100761478604080523ebb1291ecff90e779e/xarray/core/dask_array_ops.py#L74-L85

which adds a small array with a small chunk to the initial array.

There is another related problem where `DataArray.rolling()` changes the size and distribution of `dask.array` chunks which stems from this code

https://github.com/pydata/xarray/blob/b622c5e7da928524ef949d9e389f6c7f38644494/xarray/core/dask_array_ops.py#L23

For some (historic) reason there are these two rolling-window functions for `dask`. Both need to be fixed to preserve chunksize of a `dask.array` in all cases.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2532/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
376154741,MDU6SXNzdWUzNzYxNTQ3NDE=,2531,DataArray.rolling() does not preserve chunksizes in some cases,102827,closed,0,,,2,2018-10-31T20:50:33Z,2021-03-26T19:50:49Z,2021-03-26T19:50:49Z,CONTRIBUTOR,,,,"This issue was found and discussed in the related issue #2514 

I open a separate issue for clarity.

#### Code Sample, a copy-pastable example if possible

```python
import pandas as pd
import numpy as np
import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
bar = np.sin(np.arange(len(t)))
baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]),
                       coords={'time': t,
                               'sensor': ['one', 'two']},
                       dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)
```
```
Output for `mean`: ((2,), (745,))
Output for `count`: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
```
#### Problem description

DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method.

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2531/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
229807027,MDExOlB1bGxSZXF1ZXN0MTIxMzc5NjAw,1414,Speed up `decode_cf_datetime`,102827,closed,0,,,12,2017-05-18T21:15:40Z,2017-07-26T07:40:24Z,2017-07-25T17:42:52Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1414," - [x] Closes #1399
 - [x] Tests added / passed
 - [x] Passes ``git diff upstream/master | flake8 --diff``
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

Instead of casting the input numeric dates to float, they are now
casted to nanoseconds as int64 which makes `pd.to_timedelta()`
work much faster (x100 speedup on my machine).

On my machine all existing tests for `conventions.py` pass. Overflows should be handled by [these two already existing lines](https://github.com/cchwala/xarray/commit/d7d7c01f3e2f14c38c44e62f648b30474469b078#diff-d94eba38daa73be812c57c756f01f0daR158) since everything in the valid range of `pd.to_datetime` should be save.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1414/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
226549366,MDU6SXNzdWUyMjY1NDkzNjY=,1399,`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed,102827,closed,0,,,6,2017-05-05T11:48:00Z,2017-07-25T17:42:52Z,2017-07-25T17:42:52Z,CONTRIBUTOR,,,,"Hi,
 `decode_cf_datetime` is slowed down because it [always passes floats](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L129) to [`pd.to_timedelta`](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L154), while `pd.to_timedelta` is much faster when working on integers.

[Here](https://gist.github.com/cchwala/157b87d4e413b560f8ad8555a330b937#file-timing_for_timedelta64_and_pandas_timedelta-ipynb) is a notebook that shows the differences. Working with integers is approx. one order of magnitude faster.

Hence, it would be great to automatically do the conversion from raw time value floats to integers in nanoseconds where possible (likely limited to resolutions bellow days or hours to avoid coping with different durations numbers of nanoseconds within e.g. different months).

As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion.

This could possibly also be resolved in `pd.to_timedelta` but I assume it will be more complicated to deal with all the edge cases there.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1399/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue