id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
406812274,MDU6SXNzdWU0MDY4MTIyNzQ=,2745,reindex doesn't preserve chunks,4711805,open,0,,,1,2019-02-05T14:37:24Z,2023-12-04T20:46:36Z,,CONTRIBUTOR,,,,"The following code creates a small (100x100) chunked `DataArray`, and then re-indexes it into a huge one (100000x100000):

```python
import xarray as xr
import numpy as np

n = 100
x = np.arange(n)
y = np.arange(n)
da = xr.DataArray(np.zeros(n*n).reshape(n, n), coords=[x, y], dims=['x', 'y']).chunk(n, n)

n2 = 100000
x2 = np.arange(n2)
y2 = np.arange(n2)
da2 = da.reindex({'x': x2, 'y': y2})
da2
```

But the re-indexed `DataArray` has `chunksize=(100000, 100000)` instead of `chunksize=(100, 100)`:

```
<xarray.DataArray (x: 100000, y: 100000)>
dask.array<shape=(100000, 100000), dtype=float64, chunksize=(100000, 100000)>
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999
  * y        (y) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999
```

Which immediately leads to a memory error when trying to e.g. store it to a `zarr` archive:

```python
ds2 = da2.to_dataset(name='foo')
ds2.to_zarr(store='foo', mode='w')
```

Trying to re-chunk to 100x100 before storing it doesn't help, but this time it takes a lot more time before crashing with a memory error:

```python
da3 = da2.chunk(n, n)
ds3 = da3.to_dataset(name='foo')
ds3.to_zarr(store='foo', mode='w')
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2745/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
414641120,MDU6SXNzdWU0MTQ2NDExMjA=,2789,Appending to zarr with string dtype,4711805,open,0,,,2,2019-02-26T14:31:42Z,2022-04-09T02:18:05Z,,CONTRIBUTOR,,,,"```python
import xarray as xr

da = xr.DataArray(['foo'])
ds = da.to_dataset(name='da')
ds.to_zarr('ds') # no special encoding specified

ds = xr.open_zarr('ds')
print(ds.da.values)
```

The following code prints `['foo']` (string type). The encoding chosen by zarr is `""dtype"": ""|S3""`, which corresponds to bytes, but it seems to be decoded to a string, which is what we want.

```
$ cat ds/da/.zarray 
{
    ""chunks"": [
        1
    ],
    ""compressor"": {
        ""blocksize"": 0,
        ""clevel"": 5,
        ""cname"": ""lz4"",
        ""id"": ""blosc"",
        ""shuffle"": 1
    },
    ""dtype"": ""|S3"",
    ""fill_value"": null,
    ""filters"": null,
    ""order"": ""C"",
    ""shape"": [
        1
    ],
    ""zarr_format"": 2
}
```

The problem is that if I want to append to the zarr archive, like so:

```python
import zarr

ds = zarr.open('ds', mode='a')
da_new = xr.DataArray(['barbar'])
ds.da.append(da_new)

ds = xr.open_zarr('ds')
print(ds.da.values)
```

It prints `['foo' 'bar']`. Indeed the encoding was kept as `""dtype"": ""|S3""`, which is fine for a string of 3 characters but not for 6.

If I want to specify the encoding with the maximum length, e.g:

```python
ds.to_zarr('ds', encoding={'da': {'dtype': '|S6'}})
```

It solves the length problem, but now my strings are kept as bytes: `[b'foo' b'barbar']`. If I specify a Unicode encoding:

```python
ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})
```

It is not taken into account. The zarr encoding is `""dtype"": ""|S3""` and I am back to my length problem: `['foo' 'bar']`.

The solution with `'dtype': '|S6'` is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
777670351,MDU6SXNzdWU3Nzc2NzAzNTE=,4756,feat: reindex multiple DataArrays,4711805,open,0,,,1,2021-01-03T16:23:01Z,2021-01-03T19:05:03Z,,CONTRIBUTOR,,,,"When e.g. creating a `Dataset` from multiple `DataArray`s that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable `NaN`s inserted in the data set.
For instance, consider the following data arrays that are not exactly aligned:
```python
import xarray as xr

da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1')
da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2')
da1.plot.imshow()
da2.plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103482830-542bbe80-4de3-11eb-814b-bb1f705967c4.png) ![image](https://user-images.githubusercontent.com/4711805/103482836-61e14400-4de3-11eb-804b-f549c2551562.png)
They show gaps when combined in a data set:
```python
ds = xr.Dataset({'da1': da1, 'da2': da2})
ds['da1'].plot.imshow()
ds['da2'].plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103482959-3f9bf600-4de4-11eb-9513-900319cb485a.png) ![image](https://user-images.githubusercontent.com/4711805/103482966-47f43100-4de4-11eb-853b-2b44f7bc8d7f.png)
I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a `reindex_like` method, which accepts a tolerance, but calling it successively on every data array, like so:
```python
da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2)
da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2)
```
would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:

```python
import numpy as np
from functools import reduce

def reindex_all(arrays, dims, tolerance):
    coords = {}
    for dim in dims:
        coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim])
        diff = coord[:-1] - coord[1:]
        keep = np.abs(diff) > tolerance
        coords[dim] = np.append(coord[:-1][keep], coord[-1])
    reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays]
    return reindexed

da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2)
dsr = xr.Dataset({'da1': da1r, 'da2': da2r})
dsr['da1'].plot.imshow()
dsr['da2'].plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103483065-00ba7000-4de5-11eb-8581-fb156970a7e8.png) ![image](https://user-images.githubusercontent.com/4711805/103483072-0748e780-4de5-11eb-8b42-6bd9b248ab1e.png)
I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4756/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
415614806,MDU6SXNzdWU0MTU2MTQ4MDY=,2793,Fit bounding box to coarser resolution,4711805,open,0,,,2,2019-02-28T13:07:09Z,2019-04-11T14:37:47Z,,CONTRIBUTOR,,,,"When using [coarsen](http://xarray.pydata.org/en/latest/generated/xarray.DataArray.coarsen.html), we often need to align the original DataArray with the coarser coordinates. For instance:
```python
import xarray as xr
import numpy as np

da = xr.DataArray(np.arange(4*4).reshape(4, 4), coords=[np.arange(4, 0, -1) + 0.5, np.arange(4) + 0.5], dims=['lat', 'lon'])
# <xarray.DataArray (lat: 4, lon: 4)>
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15]])
# Coordinates:
#   * lat      (lat) float64 4.5 3.5 2.5 1.5
#   * lon      (lon) float64 0.5 1.5 2.5 3.5

da.coarsen(lat=2, lon=2).mean()
# <xarray.DataArray (lat: 2, lon: 2)>
# array([[ 2.5,  4.5],
#        [10.5, 12.5]])
# Coordinates:
#   * lat      (lat) float64 4.0 2.0
#   * lon      (lon) float64 1.0 3.0
```
But if the coarser coordinates are aligned like:
```
lat: ... 5 3 1 ...
lon: ... 1 3 5 ...
```
Then directly applying `coarsen` will not work (here on the `lat` dimension). The following function extends the original DataArray so that it is aligned with the coarser coordinates:
```python
def adjust_bbox(da, dims):
    """"""Adjust the bounding box of a DaskArray to a coarser resolution.

    Args:
        da: the DaskArray to adjust.
        dims: a dictionary where keys are the name of the dimensions on which to adjust, and the values are of the form [unsigned_coarse_resolution, signed_original_resolution]
    Returns:
        The DataArray bounding box adjusted to the coarser resolution.
    """"""
    coords = {}
    for k, v in dims.items():
        every, step = v
        offset = step / 2
        dim0 = da[k].values[0] - offset
        dim1 = da[k].values[-1] + offset
        if step < 0: # decreasing coordinate
            dim0 = dim0 + (every - dim0 % every) % every
            dim1 = dim1 - dim1 % every
        else: # increasing coordinate
            dim0 = dim0 - dim0 % every
            dim1 = dim1 + (every - dim1 % every) % every
        coord0 = np.arange(dim0+offset, da[k].values[0]-offset, step)
        coord1 = da[k].values
        coord2 = np.arange(da[k].values[-1]+step, dim1, step)
        coord = np.hstack((coord0, coord1, coord2))
        coords[k] = coord
    return da.reindex(**coords).fillna(0)

da = adjust_bbox(da, {'lat': (2, -1), 'lon': (2, 1)})
# <xarray.DataArray (lat: 6, lon: 4)>
# array([[ 0.,  0.,  0.,  0.],
#        [ 0.,  1.,  2.,  3.],
#        [ 4.,  5.,  6.,  7.],
#        [ 8.,  9., 10., 11.],
#        [12., 13., 14., 15.],
#        [ 0.,  0.,  0.,  0.]])
# Coordinates:
#   * lat      (lat) float64 5.5 4.5 3.5 2.5 1.5 0.5
#   * lon      (lon) float64 0.5 1.5 2.5 3.5

da.coarsen(lat=2, lon=2).mean()
# <xarray.DataArray (lat: 3, lon: 2)>
# array([[0.25, 1.25],
#        [6.5 , 8.5 ],
#        [6.25, 7.25]])
# Coordinates:
#   * lat      (lat) float64 5.0 3.0 1.0
#   * lon      (lon) float64 1.0 3.0
```
Now `coarsen` gives the right result. But `adjust_bbox` is rather complicated and specific to this use case (evenly spaced coordinate points...). Do you know of a better/more general way of doing it?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2793/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue