id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1128485610,PR_kwDOAMm_X84yTE49,6258,removed check for last dask chunk size in to_zarr,6574622,closed,0,,,4,2022-02-09T12:34:43Z,2022-02-09T15:13:21Z,2022-02-09T15:12:32Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6258,"When storing a dask-chunked dataset to zarr, the size of the last chunk
in each dimension does not matter, as this single last chunk will be
written to any number of zarr chunks, but none of the zarr chunks which
are being written to will be accessed by any other dask chunk.
- [x] Closes #6255
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
cc'ing @rabernat who seems to have worked on this lately.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6258/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1128282637,I_kwDOAMm_X85DQDoN,6255,Writing large (aligned) dask-chunks to small zarr chunks fails.,6574622,closed,0,,,0,2022-02-09T09:35:24Z,2022-02-09T15:12:31Z,2022-02-09T15:12:31Z,CONTRIBUTOR,,,,"### What happened?
I'm trying to write a dataset which is (dask-) chunked in large chunks into zarr which should be chunked in smaller chunks.
The dask chunks are intentionally chosen to be integer multiples of the zarr chunks, such that there will never be two dask chunks which may be written into a single zarr chunk.
When trying to write such a dataset using `to_zarr`, the following exception appears:
```
NotImplementedError: Final chunk of Zarr array must be the same size or smaller than the first. Specified Zarr chunk encoding['chunks']=(1,), for variable named 'a' but (2, 2) in the variable's Dask chunks ((2, 2),) are incompatible with this encoding. Consider either rechunking using `chunk()`, deleting or modifying `encoding['chunks']`, or specify `safe_chunks=False`.
```
### What did you expect to happen?
I'd expect the write to ""just work"".
### Minimal Complete Verifiable Example
```python
import xarray as xr
ds = xr.Dataset({""a"": (""x"", [1, 2, 3, 4])}).chunk({""x"": 2})
m = {}
ds.to_zarr(m, encoding={""a"": {""chunks"": (1,)}})
```
### Relevant log output
_No response_
### Anything else we need to know?
I believe that the expected behaviour is according to [this design choice](https://github.com/pydata/xarray/blob/d47cf0c850cb70429373782b3c1e0329d14fd05a/xarray/backends/zarr.py#L153):
```
# DESIGN CHOICE: do not allow multiple dask chunks on a single zarr chunk
# this avoids the need to get involved in zarr synchronization / locking
# From zarr docs:
# ""If each worker in a parallel computation is writing to a separate
# region of the array, and if region boundaries are perfectly aligned
# with chunk boundaries, then no synchronization is required.""
```
But I believe that [this if-statement](https://github.com/pydata/xarray/blob/d47cf0c850cb70429373782b3c1e0329d14fd05a/xarray/backends/zarr.py#L178) is not needed and should be removed.
The if-statement compares the size of the last dask-chunk within each dimenstion to the zarr-chunk size. There are three possible cases, which (as far as I understand) should all be just fine:
* the dask-chunk is smaller than the zarr chunk: one dask chunk will write into one (smaller, last) zarr chunk
* the dask-chunk is equal than the zarr chunk: one dask chunk will write into one zarr chunk
* ther dask-chunk is larger than the zarr chunk: one dask chunk will write into multiple zarr chunks. None of these zarr chunks will be touched by any other dask-chunk as [all previous dask chunks are aligned to zarr-chunk boundaries](https://github.com/pydata/xarray/blob/d47cf0c850cb70429373782b3c1e0329d14fd05a/xarray/backends/zarr.py#L165).
**Note:** If that if-statement goes away, [this one](https://github.com/pydata/xarray/blob/d47cf0c850cb70429373782b3c1e0329d14fd05a/xarray/backends/zarr.py#L163) may go away as well (was introduced in #4312).
### Environment
INSTALLED VERSIONS
```
commit: None
python: 3.9.10 (main, Jan 15 2022, 11:48:00)
[Clang 13.0.0 (clang-1300.0.29.3)]
python-bits: 64
OS: Darwin
OS-release: 20.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.20.1
pandas: 1.2.0
numpy: 1.21.2
scipy: 1.6.2
netCDF4: 1.5.8
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.10.2
cftime: 1.3.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.1
distributed: 2021.11.1
matplotlib: 3.4.1
cartopy: 0.20.1
seaborn: 0.11.1
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: 0.17
sparse: 0.13.0
setuptools: 60.5.0
pip: 21.3.1
conda: None
pytest: 6.2.2
IPython: 8.0.0.dev
sphinx: 3.5.0
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6255/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
673513695,MDExOlB1bGxSZXF1ZXN0NDYzMzYyMTIw,4312,allow manual zarr encoding on unchunked dask dimensions,6574622,closed,0,,,3,2020-08-05T12:49:04Z,2022-02-09T09:31:51Z,2020-08-19T14:58:09Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4312,"If a dask array is chunked along one dimension but not chunked along
another, any manually specified zarr chunk size should be valid, but
before this patch, this resulted in an error.
- [x] Tests added
- [x] Passes `isort . && black . && mypy . && flake8` (only for modified sections)
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4312/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
817302678,MDExOlB1bGxSZXF1ZXN0NTgwODE3NDQ5,4966,conventions: decode unsigned integers to signed if _Unsigned=false,6574622,closed,0,,,5,2021-02-26T12:05:51Z,2021-03-12T14:21:12Z,2021-03-12T14:20:20Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4966,"netCDF3 doesn't know unsigned while OPeNDAP doesn't know signed (bytes).
Depending on which backend source is used, the original data is stored
with the wrong signedness and needs to be decoded based on the _Unsigned
attribute. While the netCDF3 variant is already implemented, this commit
adds the symmetric case covering OPeNDAP.
- [x] Closes #4954
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4966/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
815858485,MDU6SXNzdWU4MTU4NTg0ODU=,4954,Handling of signed bytes from OPeNDAP via pydap,6574622,closed,0,,,2,2021-02-24T21:21:38Z,2021-03-12T14:20:19Z,2021-03-12T14:20:19Z,CONTRIBUTOR,,,,"netCDF3 only knows signed bytes, but there's [a convention](https://www.unidata.ucar.edu/software/netcdf/documentation/NUG/_best_practices.html) of adding an attribute `_Unsigned=True` to the variable to be able to store unsigned bytes non the less. This convention is handled [at this place](https://github.com/pydata/xarray/blob/df052e7431540fb435ac8742aabc32754a00a7f5/xarray/coding/variables.py#L311) by xarray.
OPeNDAP only knows unsigned bytes, but there's [a hack](https://github.com/Unidata/netcdf-c/pull/1317) which is used by the thredds server and the netCDF-c library of adding an attribute `_Unsigned=False` to the variable to be able to store signed bytes non the less. This hack is **not** handled by xarray, but maybe should be handled symmetrically at the same place (i.e. `if .kind == ""u"" and unsigned == False`).
As descibed in the ""hack"", netCDF-c handles this internally, but pydap doesn't. This is why the `engine=""netcdf4""` variant returns (correctly according to the hack) negative values and the `engine=""pydap""` variant doesn't. However, as `xarray` returns a warning at exactly the location referenced above, I think that this is the place where it should be fixed.
If you agree, I could prepare a PR to implement the fix.
```python
In [1]: import xarray as xr
In [2]: xr.open_dataset(""https://observations.ipsl.fr/thredds/dodsC/EUREC4A/PRODUCTS/testdata/netcdf_testfiles/test_NC_BYTE_neg.nc"", engine=""netcdf4"")
Out[2]:
Dimensions: (test: 7)
Coordinates:
* test (test) float32 -128.0 -1.0 0.0 1.0 2.0 nan 127.0
Data variables:
*empty*
In [3]: xr.open_dataset(""https://observations.ipsl.fr/thredds/dodsC/EUREC4A/PRODUCTS/testdata/netcdf_testfiles/test_NC_BYTE_neg.nc"", engine=""pydap"")
/usr/local/lib/python3.9/site-packages/xarray/conventions.py:492: SerializationWarning: variable 'test' has _Unsigned attribute but is not of integer type. Ignoring attribute.
new_vars[k] = decode_cf_variable(
Out[3]:
Dimensions: (test: 7)
Coordinates:
* test (test) float32 128.0 255.0 0.0 1.0 2.0 nan 127.0
Data variables:
*empty*
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4954/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue