id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
762323609,MDU6SXNzdWU3NjIzMjM2MDk=,4681,Uncompressed Zarr arrays can no longer be written to Zarr,206773,open,0,,,2,2020-12-11T13:02:28Z,2023-10-24T23:08:35Z,,NONE,,,,"**What happened**:

We create `xarray.Dataset` instances using `xr.open_zarr(store)` with custom chunk `store` instances. These will lazily fetch data chunks for data variables from the [Sentinel Hub](https://www.sentinel-hub.com/) API. For coordinate variables `lon`, `lat`, `time` we use ""static"" store entries: uncompressed, bytified numpy arrays.

Since xarray 0.16.2 and Zarr 2.6.1 this approach doesnt work anymore. When we _write_ datasets opened from such store using `xr.to_zarr(dst_store)`, e.g. with a `dst_store=s3fs.S3Map()`, we get encoding errors. E.g. for a coordinate array `lon` we get from botocore:

    Invalid type for parameter Body, value: [55.0475 55.0465 55.0455 ... 53.0025 53.0015 53.0005], type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object
 
(Full traceback is below.) It seems that our static numpy arrays won't be encoded at all, because they are uncompressed. If we use a compressor, it works again. (That's our current workaround.)

**What you expected to happen**:

Before data is written into a Zarr chunk store, it must be encoded from numpy arrays to bytes. 
This does not seem to happen if uncompressed data is written, that is, the the Zarr encoding's `compressor` and `filters` are both None.

**Minimal Complete Verifiable Example**:

A minimal, self-contained example is the entire test module [test_reprod_27.py](https://github.com/dcs4cop/xcube-sh/blob/master/test/test_reprod_27.py) of the xcube Sentinel Hub plugin `xcube-sh`.

Original issue in the Sentinel Hub xcube plugin is [xcube-sh #27](https://github.com/dcs4cop/xcube-sh/issues/27).

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 18:58:29) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: de_DE.cp1252
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: 1.5.3
netCDF4: 1.5.5
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.1
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: None
iris: None
bottleneck: None
dask: 2.30.0
distributed: 2.30.1
matplotlib: 3.3.3
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.1
conda: None
pytest: 6.1.2
IPython: 7.19.0
sphinx: 3.3.1

</details>

**Traceback**:

traceback:

```
File ""D:\Projects\xcube\xcube\cli\_gen2\write.py"", line 47, in write_cube
data_id = writer.write_data(cube,
File ""D:\Projects\xcube\xcube\core\store\stores\s3.py"", line 213, in write_data
self._new_s3_writer(writer_id).write_data(data, data_id=path, replace=replace, **write_params)
File ""D:\Projects\xcube\xcube\core\store\accessors\dataset.py"", line 313, in write_data
data.to_zarr(s3fs.S3Map(root=f'{bucket_name}/{data_id}' if bucket_name else data_id,
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\dataset.py"", line 1745, in to_zarr
return to_zarr(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\api.py"", line 1481, in to_zarr
dump_to_store(dataset, zstore, writer, encoding=encoding)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\api.py"", line 1158, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\zarr.py"", line 473, in store
self.set_variables(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\zarr.py"", line 549, in set_variables
writer.add(v.data, zarr_array, region)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\common.py"", line 143, in add
target[region] = source
File ""D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py"", line 1122, in _setitem_
self.set_basic_selection(selection, value, fields=fields)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py"", line 1217, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py"", line 1508, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py"", line 1580, in _set_selection
self._chunk_setitems(lchunk_coords, lchunk_selection, chunk_values,
File ""D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py"", line 1709, in _chunk_setitems
self.chunk_store.setitems({k: v for k, v in zip(ckeys, cdatas)})
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\mapping.py"", line 110, in setitems
self.fs.pipe(values)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py"", line 121, in wrapper
return maybe_sync(func, self, args, *kwargs)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py"", line 100, in maybe_sync
return sync(loop, func, args, *kwargs)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py"", line 71, in sync
raise exc.with_traceback(tb)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py"", line 55, in f
result[0] = await future
File ""D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py"", line 211, in _pipe
await asyncio.gather(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py"", line 608, in _pipe_file
return await self._call_s3(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py"", line 225, in _call_s3
raise translate_boto_error(err) from err
File ""D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py"", line 207, in _call_s3
return await method(**additional_kwargs)
File ""D:\Miniconda3\envs\xcube\lib\site-packages\aiobotocore\client.py"", line 123, in _make_api_call
request_dict = await self._convert_to_request_dict(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\aiobotocore\client.py"", line 171, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
File ""D:\Miniconda3\envs\xcube\lib\site-packages\botocore\validate.py"", line 297, in serialize_to_request
raise ParamValidationError(report=report.generate_report())

Invalid type for parameter Body, value: [55.0475 55.0465 55.0455 ... 53.0025 53.0015 53.0005], type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4681/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1226272301,I_kwDOAMm_X85JF24t,6573,32- vs 64-bit coordinates coordinates in where(),206773,open,0,,,6,2022-05-05T06:57:36Z,2022-09-28T08:17:09Z,,NONE,,,,"### What happened?

I'm struggling whether this is a bug or not. At least I faced a very unexpected behaviour.

For two given data arrays `a` and `b` with same dimensions and equal coordinates, `c` for `c = a.where(b)` should have equal dimensions and coordinates.

However if the coordinates of `a` have dtype of float32 and those of `b` are float64, then the dimension sizes of `c` will always be two. Of course, this way the coordinates of `a` and `b` are no longer exactly equal, but from a user perspective they represent the same labels.

The behaviour is likely caused by the fact that the indexes generated for the coordinates are no longer strictly equal, therefore `where()` picks only the two outer cells of each dimension. Allowing to explicitly pass indexes may help here, see #6392.

### What did you expect to happen?

In the case described above, the dimensions and coordinates of `c` should be equal to `a` (and `b`). 

### Minimal Complete Verifiable Example

```Python
import numpy as np
import xarray as xr

c32 = xr.DataArray(np.linspace(0, 1, 10, dtype=np.float32), dims='x')
c64 = xr.DataArray(np.linspace(0, 1, 10, dtype=np.float64), dims='x')

c3 = c32.where(c64 > 0.5)
assert len(c32) == len(c3)

v32 = xr.DataArray(np.random.random(10), dims='x', coords=dict(x=c32))
v64 = xr.DataArray(np.random.random(10), dims='x', coords=dict(x=c64))

v3 = v32.where(v64 > 0.5)
assert len(v32) == len(v3)
# --> Assertion error, Expected :10, Actual :2
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:17:03) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('de_DE', 'cp1252')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.21.6
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.2
IPython: 8.2.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6573/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
906748201,MDU6SXNzdWU5MDY3NDgyMDE=,5405,Control CF-encoding in to_zarr(),206773,open,0,,,2,2021-05-30T12:57:40Z,2021-06-23T15:47:32Z,,NONE,,,,"**Is your feature request related to a problem? Please describe.**

I believe, xarray's `dataset.to_zarr()` is somewhat inconsitent between creating variables and appending data to existing variables: When creating variables it can deal with writing already encoded data. When appending, it expects decoded data.

When appending data, xarray will always CF-encode variable data according to encoding information of existing variables before it appends new data. This is fine if data to be appended is decoded, but if the data to be appended is already encoded (e.g. because it was previously read by `dataset = xr.open_dataset(..., decode_cf=False)`) then this leads to entirely corrupt data. 

See also xarray issue #5263 and my actual problem described in https://github.com/bcdev/nc2zarr/issues/35.

**Describe the solution you'd like**

A possible hack is to redundantly use `dataset = decode_cf(dataset)` before appending so encoding it again is finally a no-op, as described in #5263. This of course also costs extra CPU for a useless computation. 

I'd like to control whether encoding of data shall take place when appending. If I already have encoded data, I'd like to call `encoded_dataset.to_zarr(..., append_dim='time', encode_cf=False)`.

For example, when I uncomment line 469 in `xarray/backends/zarr.py`, then this fixes this issue too:

https://github.com/pydata/xarray/blob/1b4412eeb7011f53932779e1d7c3534163aedd63/xarray/backends/zarr.py#L460-L471


**Minimal Complete Verifiable Example**:

Here is a test that explains the observed inconsistency.

```python
import shutil
import unittest

import numpy as np
import xarray as xr
import zarr

SRC_DS_1_PATH = 'src_ds_1.zarr'
SRC_DS_2_PATH = 'src_ds_2.zarr'
DST_DS_PATH = 'dst_ds.zarr'


class XarrayToZarrAppendInconsistencyTest(unittest.TestCase):
    @classmethod
    def del_paths(cls):
        for path in (SRC_DS_1_PATH, SRC_DS_2_PATH, DST_DS_PATH):
            shutil.rmtree(path, ignore_errors=True)

    def setUp(self):
        self.del_paths()

        scale_factor = 0.0001
        self.v_values_encoded = np.array([[0, 10000, 15000, 20000]], dtype=np.uint16)
        self.v_values_decoded = np.array([[np.nan, 1., 1.5, 2.]], dtype=np.float32)

        # The variable for the two source datasets
        v = xr.DataArray(self.v_values_encoded,
                         dims=('t', 'x'),
                         attrs=dict(scale_factor=scale_factor, _FillValue=0))

        # Create two source datasets
        src_ds = xr.Dataset(data_vars=dict(v=v))
        src_ds.to_zarr(SRC_DS_1_PATH)
        src_ds.to_zarr(SRC_DS_2_PATH)

        # Assert we have written encoded data
        a1 = zarr.convenience.open_array(SRC_DS_1_PATH + '/v')
        a2 = zarr.convenience.open_array(SRC_DS_2_PATH + '/v')
        np.testing.assert_equal(a1, self.v_values_encoded)  # succeeds
        np.testing.assert_equal(a2, self.v_values_encoded)  # succeeds

        # Assert we correctly decode data
        src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=True)
        src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=True)
        np.testing.assert_equal(src_ds_1.v.data, self.v_values_decoded)  # succeeds
        np.testing.assert_equal(src_ds_2.v.data, self.v_values_decoded)  # succeeds

    def tearDown(self):
        self.del_paths()

    def test_decode_cf_true(self):
        """"""
        This test succeeds.
        """"""
        # Open the two source datasets
        src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=True)
        src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=True)
        # Expect data is decoded
        np.testing.assert_equal(src_ds_1.v.data, self.v_values_decoded)  # succeeds
        np.testing.assert_equal(src_ds_2.v.data, self.v_values_decoded)  # succeeds

        # Write 1st source datasets to new dataset, append the 2nd source
        src_ds_1.to_zarr(DST_DS_PATH, mode='w-')
        src_ds_2.to_zarr(DST_DS_PATH, append_dim='t')

        # Open the new dataset
        dst_ds = xr.open_zarr(DST_DS_PATH, decode_cf=True)
        dst_ds_1 = dst_ds.isel(t=slice(0, 1))
        dst_ds_2 = dst_ds.isel(t=slice(1, 2))
        # Expect data is decoded
        np.testing.assert_equal(dst_ds_1.v.data, self.v_values_decoded)  # succeeds
        np.testing.assert_equal(dst_ds_2.v.data, self.v_values_decoded)  # succeeds

    def test_decode_cf_false(self):
        """"""
        This test fails by the last assertion with

        AssertionError:
        Arrays are not equal

        Mismatched elements: 3 / 4 (75%)
        Max absolute difference: 47600
        Max relative difference: 4.76
         x: array([[    0, 57600, 53632, 49664]], dtype=uint16)
         y: array([[    0, 10000, 15000, 20000]], dtype=uint16)
        """"""
        # Open the two source datasets
        src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=False)
        src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=False)
        # Expect data is NOT decoded (still encoded)
        np.testing.assert_equal(src_ds_1.v.data, self.v_values_encoded)  # succeeds
        np.testing.assert_equal(src_ds_2.v.data, self.v_values_encoded)  # succeeds

        # Write 1st source datasets to new dataset, append the 2nd source
        src_ds_1.to_zarr(DST_DS_PATH, mode='w-')
        # Avoid ValueError: failed to prevent overwriting existing key scale_factor in attrs. ...
        del src_ds_2.v.attrs['scale_factor']
        del src_ds_2.v.attrs['_FillValue']
        src_ds_2.to_zarr(DST_DS_PATH, append_dim='t')

        # Open the new dataset
        dst_ds = xr.open_zarr(DST_DS_PATH, decode_cf=False)
        dst_ds_1 = dst_ds.isel(t=slice(0, 1))
        dst_ds_2 = dst_ds.isel(t=slice(1, 2))
        # Expect data is NOT decoded (still encoded)
        np.testing.assert_equal(dst_ds_1.v.data, self.v_values_encoded)  # succeeds
        np.testing.assert_equal(dst_ds_2.v.data, self.v_values_encoded)  # fails
```

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 15:50:08) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: de_DE.cp1252
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.17.0
pandas: 1.2.2
numpy: 1.20.1
scipy: 1.6.0
netCDF4: 1.5.6
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: None
iris: None
bottleneck: None
dask: 2021.02.0
distributed: 2021.02.0
matplotlib: 3.3.4
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: 6.2.2
IPython: 7.21.0
sphinx: 3.5.1

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5405/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
258500654,MDU6SXNzdWUyNTg1MDA2NTQ=,1576,Variable of dtype int8 casted to float64,206773,closed,0,,,11,2017-09-18T14:28:32Z,2020-11-09T07:06:31Z,2020-11-09T07:06:30Z,NONE,,,,"I'm using a CF-compliant dataset from the ESA Land Cover CCI Project that contains a variable `lccs_class` with `dtype=int8` and attribute `_Unsigned='true'`. Its values are class numbers in the range 1 to 220.  When I open the dataset with default options, the resulting dtype of that variable will be `float64`. As the Land Cover maps are quite large (global, 300m grid cells, 129600 x 64800) this produces a considerable memory overhead. 

    >>> ds = xr.open_dataset(path)
    >>> ds['lccs_class'].dtype
    dtype('float64')

If I switch off CF decoding I get the original data type.

    >>> ds = xr.open_dataset(path, decode_cf=False)
    >>> ds['lccs_class'].dtype
    dtype('int8')

I'd actually expect it to be converted to `uint8` or `int16` so that values above 127 are represented correctly.

The dataset is available here: ftp://anon-ftp.ceda.ac.uk/neodc/esacci/land_cover/data/land_cover_maps/v1.6.1/ESACCI-LC-L4-LCCS-Map-300m-P5Y-2010-v1.6.1.nc. Note the file is ~3 GB. 

Btw, the attributes of the variable are

    >>> ds['lccs_class'].attrs
    OrderedDict([('long_name', 'Land cover class defined in LCCS'),
                 ('standard_name', 'land_cover_lccs'),
                 ('flag_values',
                  array([   0,   10,   11,   12,   20,   30,   40,   50,   60,   61,   62,
                           70,   71,   72,   80,   81,   82,   90,  100,  110,  120,  121,
                          122, -126, -116, -106, -104, -103,  -96,  -86,  -76,  -66,  -56,
                          -55,  -54,  -46,  -36], dtype=int8)),
                 ('flag_meanings',
                  'no_data cropland_rainfed cropland_rainfed_herbaceous_cover cropland_rainfed_tree_or_shrub_cover ...'),
                 ('valid_min', array([1])),
                 ('valid_max', array([220])),
                 ('_Unsigned', 'true'),
                 ('_FillValue', array([0], dtype=int8)),
                 ('ancillary_variables',
                  'processed_flag current_pixel_state observation_count algorithmic_confidence_level')])
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1576/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
146287030,MDU6SXNzdWUxNDYyODcwMzA=,819,N-D rolling,206773,closed,0,,,5,2016-04-06T11:42:42Z,2019-02-27T17:48:20Z,2019-02-27T17:48:20Z,NONE,,,,"Dear xarray Team,

We just discovered xarray and it seems to be a fantastic candidate to serve as a core library for our climate data toolbox we are about to implement. While investigating the API we recognized that the `windows` kwargs in

```
DataArray.rolling(min_periods=None, center=False, **windows)
```

is limited to a single `dim=window_size` entry. Are there any plans to make it rolling in N-D?
This could be very useful for efficient gap filling, filtering or other methodologies that use grid cell neighbourhoods in multiple dimensions. 

Actually, I also asked myself why the `groupby` and `resample` methods don't take an N-D `dim` argument. This would allow for performing not only a temporal resampling but also a spatial resampling in the lat/lon plane or even a spatio-temporal resampling (including up- and downsampling in either dim).

Anyway, thanks for xarray!

Regards
    Norman
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/819/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
165540933,MDU6SXNzdWUxNjU1NDA5MzM=,899,Let open_mfdataset() respect cell boundary variables,206773,closed,0,,,5,2016-07-14T11:36:49Z,2019-02-25T19:28:23Z,2019-02-25T19:28:23Z,NONE,,,,"I recently faced a problem with `open_mfdataset()` as it concats variables that are actually used as auxilary coordinate variables, namely the cell boundary variables 'time_bnds', 'lat_bnds' and 'lon_bnds' (see CF Conventions [7.1. Cell Boundaries](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#cell-boundaries)).  `open_mfdataset()` will attach an extra 'time' dimension to 'lat_bnds' and 'lon_bnds' because they are seen as data variables rather than coordinates variables.

We could solve the problem by using the _preprocess_ argument and turning these data variables into coordinates variables with ds.set_coords('lat_bnds', inplace=True).

However it would be nice to prevent concatenation of variables that don't have the _concat_dim_, e.g. by a keyword argument _selective_concat_ or _respect_cell_bnds_vars_ or so.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/899/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
146975644,MDU6SXNzdWUxNDY5NzU2NDQ=,822,value scaling wrong in special cases,206773,closed,0,,,13,2016-04-08T16:29:33Z,2019-02-19T02:11:31Z,2019-02-19T02:11:31Z,NONE,,,,"For the same netCDF file used in #821, the value scaling seems to be wrongly applied to compute float64 surface temperature values from a (signed) `short` variable `analysed_sst`:

```
short analysed_sst(time=1, lat=3600, lon=7200);
  :_FillValue = -32768S; // short
  :units = ""kelvin"";
  :scale_factor = 0.01f; // float
  :add_offset = 273.15f; // float
  :long_name = ""analysed sea surface temperature"";
  :valid_min = -300S; // short
  :valid_max = 4500S; // short
  :standard_name = ""sea_water_temperature"";
  :depth = ""20 cm"";
  :source = ""ATSR<1,2>-ESACCI-L3U-v1.0, AATSR-ESACCI-L3U-v1.0, AVHRR<12,14,15,16,17,18>_G-ESACCI-L2P-v1.0, AVHRRMTA-ESACCI-L2P-v1.0"";
  :comment = ""SST analysis produced for ESA SST CCI project using the OSTIA system in reanalysis mode."";
  :_ChunkSizes = 1, 1196, 2393; // int
```

Values are roughly -50 to 600 Kelvin instead of 270 to 310 Kelvin. It seems like the problem arises from misinterpreting the signed short raw values in the netCDF file.

Here is a notebook that better explains the issue: https://github.com/CCI-Tools/sandbox/blob/4c7a98a4efd1ba55152d2799b499cb27027c2b45/notebooks/norman/xarray-sst-issues.ipynb
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/822/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
321553778,MDU6SXNzdWUzMjE1NTM3Nzg=,2109,Dataset.expand_dims() not lazy,206773,closed,0,,,2,2018-05-09T12:39:44Z,2018-05-09T15:45:31Z,2018-05-09T15:45:31Z,NONE,,,,"The following won't come back for a very long time or will fail with an out-of-memory error:

```python
>>> ds = xr.open_dataset(""D:\\EOData\\LC-CCI\\ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.8.nc"")
>>> ds
<xarray.Dataset>
Dimensions:              (lat: 64800, lon: 129600)
Coordinates:
  * lat                  (lat) float32 89.9986 89.9958 89.9931 89.9903 ...
  * lon                  (lon) float32 -179.999 -179.996 -179.993 -179.99 ...
Data variables:
    change_count         (lat, lon) int8 ...
    crs                  int32 ...
    current_pixel_state  (lat, lon) int8 ...
    observation_count    (lat, lon) int16 ...
    processed_flag       (lat, lon) int8 ...
    lccs_class           (lat, lon) uint8 ...
Attributes:
    title:                      ESA CCI Land Cover Map
    summary:                    This dataset contains the global ESA CCI land...
    type:                       ESACCI-LC-L4-LCCS-Map-300m-P1Y
    id:                         ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.7
    project:                    Climate Change Initiative - European Space Ag...
    references:                 http://www.esa-landcover-cci.org/
    ...
>>> ds_with_time = ds.expand_dims('time')
Zzzzzzz...
```
#### Problem description

When I call Dataset.expand_dims('time') on one of my ~2GB datasets (compressed), it seems to load all data data into memory, at least memory consumption goes beyond 12GB eventually ending in an out-of-memory exception.

![image](https://user-images.githubusercontent.com/206773/39814687-9516b576-5395-11e8-85e2-5b5cdd3a0875.png)

(Sorry for the German UI.)

#### Expected Output

`Dataset.expand_dims` should execute lazy and fast and not require considerable memory as adding a scalar time dimension should only affect indexing but not an array's memory layout. Array data should not be loaded into memory  (through Dask, Zarr, etc).

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
xarray: 0.10.2
pandas: 0.20.3
numpy: 1.13.1
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: 2.2.0
bottleneck: 1.2.1
cyordereddict: None
dask: 0.15.2
distributed: 1.19.1
matplotlib: 2.1.1
cartopy: 0.16.0
seaborn: None
setuptools: 36.3.0
pip: 9.0.1
conda: None
pytest: 3.1.3
IPython: None
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2109/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
258744901,MDU6SXNzdWUyNTg3NDQ5MDE=,1579,Support for unsigned data,206773,closed,0,,,3,2017-09-19T08:57:15Z,2017-09-21T15:46:30Z,2017-09-20T13:15:36Z,NONE,,,,"The ""old"" NetCDF 3 format doesn't have explicit support for unsigned integer types and therefore a recommendation/convention exists to set the variable attribute `_Unsigned='true'`, see NetCDF docs, section [Unsigned Data](http://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html#bp_Unsigned-Data).

Are there any plans to interpret the `_Unsigned` attribute?

I'd really like to help out, but I fear I still don't know enough about dask to provide an efficient PR for that. 

My workaround is to manually convert the variables in question which are of type `int8`, same data as mentioned in #1576:

    unsigned_var = signed_int8_var & 0xff

which results in an `int16`, which is ok but still 1 byte more than the desired `uint8`.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1579/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
146908323,MDU6SXNzdWUxNDY5MDgzMjM=,821,datetime units interpretation wrong in special cases,206773,closed,0,,,3,2016-04-08T11:55:44Z,2016-04-09T16:55:10Z,2016-04-09T16:54:10Z,NONE,,,,"Hi there,

I have a datetime issue with a certain type of (CF-compliant!) netCDF files orginating from the ESA CCI Sea Surface Temperature project. With other climate data, everthings seems fine.

When I open such a netCDF file, the datetime value(s) of the time dimension seem to be wrong. If I do

```
ds = xr.open_dataset(nc_path)
ds.analysed_sst
```

I get

```
<xarray.DataArray 'analysed_sst' (time: 1, lat: 3600, lon: 7200)>
[25920000 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 1947-05-12T09:58:14
  * lat      (lat) float32 -89.975 -89.925 -89.875 -89.825 -89.775 -89.725 ...
  * lon      (lon) float32 -179.975 -179.925 -179.875 -179.825 -179.775 ...
Attributes:
    units: kelvin
    ...
```

The time dimension is

```
    int time(time=1);
      :units = ""seconds since 1981-01-01 00:00:00"";
      :standard_name = ""time"";
      :axis = ""T"";
      :calendar = ""gregorian"";
      :bounds = ""time_bnds"";
      :comment = """";
      :long_name = ""reference time of sst file"";
      :_ChunkSizes = 1; // int
```

and the time value is `915192000`. Therefore the correctly interpreted time value must be `2010-01-01T12:00:00` which is 1981-01-01 00:00:00 plus 915192000 seconds. 

Here is the link to the data: ftp://anon-ftp.ceda.ac.uk/neodc/esacci/sst/data/lt/Analysis/L4/v01.1/2010/01/01/20100101120000-ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_LT-v02.0-fv01.1.nc

I'm not sure whether this is actually a CF-specific issue with which xarray doesn't want to deal with. If so, could you please give some advice to get arround this. I'm sure other xarray lovers will face this issue sooner or later.

Thanks!
-- Norman
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/821/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue