id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 406812274,MDU6SXNzdWU0MDY4MTIyNzQ=,2745,reindex doesn't preserve chunks,4711805,open,0,,,1,2019-02-05T14:37:24Z,2023-12-04T20:46:36Z,,CONTRIBUTOR,,,,"The following code creates a small (100x100) chunked `DataArray`, and then re-indexes it into a huge one (100000x100000): ```python import xarray as xr import numpy as np n = 100 x = np.arange(n) y = np.arange(n) da = xr.DataArray(np.zeros(n*n).reshape(n, n), coords=[x, y], dims=['x', 'y']).chunk(n, n) n2 = 100000 x2 = np.arange(n2) y2 = np.arange(n2) da2 = da.reindex({'x': x2, 'y': y2}) da2 ``` But the re-indexed `DataArray` has `chunksize=(100000, 100000)` instead of `chunksize=(100, 100)`: ``` dask.array Coordinates: * x (x) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999 * y (y) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999 ``` Which immediately leads to a memory error when trying to e.g. store it to a `zarr` archive: ```python ds2 = da2.to_dataset(name='foo') ds2.to_zarr(store='foo', mode='w') ``` Trying to re-chunk to 100x100 before storing it doesn't help, but this time it takes a lot more time before crashing with a memory error: ```python da3 = da2.chunk(n, n) ds3 = da3.to_dataset(name='foo') ds3.to_zarr(store='foo', mode='w') ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2745/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 414641120,MDU6SXNzdWU0MTQ2NDExMjA=,2789,Appending to zarr with string dtype,4711805,open,0,,,2,2019-02-26T14:31:42Z,2022-04-09T02:18:05Z,,CONTRIBUTOR,,,,"```python import xarray as xr da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified ds = xr.open_zarr('ds') print(ds.da.values) ``` The following code prints `['foo']` (string type). The encoding chosen by zarr is `""dtype"": ""|S3""`, which corresponds to bytes, but it seems to be decoded to a string, which is what we want. ``` $ cat ds/da/.zarray { ""chunks"": [ 1 ], ""compressor"": { ""blocksize"": 0, ""clevel"": 5, ""cname"": ""lz4"", ""id"": ""blosc"", ""shuffle"": 1 }, ""dtype"": ""|S3"", ""fill_value"": null, ""filters"": null, ""order"": ""C"", ""shape"": [ 1 ], ""zarr_format"": 2 } ``` The problem is that if I want to append to the zarr archive, like so: ```python import zarr ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new) ds = xr.open_zarr('ds') print(ds.da.values) ``` It prints `['foo' 'bar']`. Indeed the encoding was kept as `""dtype"": ""|S3""`, which is fine for a string of 3 characters but not for 6. If I want to specify the encoding with the maximum length, e.g: ```python ds.to_zarr('ds', encoding={'da': {'dtype': '|S6'}}) ``` It solves the length problem, but now my strings are kept as bytes: `[b'foo' b'barbar']`. If I specify a Unicode encoding: ```python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}}) ``` It is not taken into account. The zarr encoding is `""dtype"": ""|S3""` and I am back to my length problem: `['foo' 'bar']`. The solution with `'dtype': '|S6'` is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 777670351,MDU6SXNzdWU3Nzc2NzAzNTE=,4756,feat: reindex multiple DataArrays,4711805,open,0,,,1,2021-01-03T16:23:01Z,2021-01-03T19:05:03Z,,CONTRIBUTOR,,,,"When e.g. creating a `Dataset` from multiple `DataArray`s that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable `NaN`s inserted in the data set. For instance, consider the following data arrays that are not exactly aligned: ```python import xarray as xr da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1') da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2') da1.plot.imshow() da2.plot.imshow() ``` ![image](https://user-images.githubusercontent.com/4711805/103482830-542bbe80-4de3-11eb-814b-bb1f705967c4.png) ![image](https://user-images.githubusercontent.com/4711805/103482836-61e14400-4de3-11eb-804b-f549c2551562.png) They show gaps when combined in a data set: ```python ds = xr.Dataset({'da1': da1, 'da2': da2}) ds['da1'].plot.imshow() ds['da2'].plot.imshow() ``` ![image](https://user-images.githubusercontent.com/4711805/103482959-3f9bf600-4de4-11eb-9513-900319cb485a.png) ![image](https://user-images.githubusercontent.com/4711805/103482966-47f43100-4de4-11eb-853b-2b44f7bc8d7f.png) I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a `reindex_like` method, which accepts a tolerance, but calling it successively on every data array, like so: ```python da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2) da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2) ``` would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following: ```python import numpy as np from functools import reduce def reindex_all(arrays, dims, tolerance): coords = {} for dim in dims: coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim]) diff = coord[:-1] - coord[1:] keep = np.abs(diff) > tolerance coords[dim] = np.append(coord[:-1][keep], coord[-1]) reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays] return reindexed da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2) dsr = xr.Dataset({'da1': da1r, 'da2': da2r}) dsr['da1'].plot.imshow() dsr['da2'].plot.imshow() ``` ![image](https://user-images.githubusercontent.com/4711805/103483065-00ba7000-4de5-11eb-8581-fb156970a7e8.png) ![image](https://user-images.githubusercontent.com/4711805/103483072-0748e780-4de5-11eb-8b42-6bd9b248ab1e.png) I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4756/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 415614806,MDU6SXNzdWU0MTU2MTQ4MDY=,2793,Fit bounding box to coarser resolution,4711805,open,0,,,2,2019-02-28T13:07:09Z,2019-04-11T14:37:47Z,,CONTRIBUTOR,,,,"When using [coarsen](http://xarray.pydata.org/en/latest/generated/xarray.DataArray.coarsen.html), we often need to align the original DataArray with the coarser coordinates. For instance: ```python import xarray as xr import numpy as np da = xr.DataArray(np.arange(4*4).reshape(4, 4), coords=[np.arange(4, 0, -1) + 0.5, np.arange(4) + 0.5], dims=['lat', 'lon']) # # array([[ 0, 1, 2, 3], # [ 4, 5, 6, 7], # [ 8, 9, 10, 11], # [12, 13, 14, 15]]) # Coordinates: # * lat (lat) float64 4.5 3.5 2.5 1.5 # * lon (lon) float64 0.5 1.5 2.5 3.5 da.coarsen(lat=2, lon=2).mean() # # array([[ 2.5, 4.5], # [10.5, 12.5]]) # Coordinates: # * lat (lat) float64 4.0 2.0 # * lon (lon) float64 1.0 3.0 ``` But if the coarser coordinates are aligned like: ``` lat: ... 5 3 1 ... lon: ... 1 3 5 ... ``` Then directly applying `coarsen` will not work (here on the `lat` dimension). The following function extends the original DataArray so that it is aligned with the coarser coordinates: ```python def adjust_bbox(da, dims): """"""Adjust the bounding box of a DaskArray to a coarser resolution. Args: da: the DaskArray to adjust. dims: a dictionary where keys are the name of the dimensions on which to adjust, and the values are of the form [unsigned_coarse_resolution, signed_original_resolution] Returns: The DataArray bounding box adjusted to the coarser resolution. """""" coords = {} for k, v in dims.items(): every, step = v offset = step / 2 dim0 = da[k].values[0] - offset dim1 = da[k].values[-1] + offset if step < 0: # decreasing coordinate dim0 = dim0 + (every - dim0 % every) % every dim1 = dim1 - dim1 % every else: # increasing coordinate dim0 = dim0 - dim0 % every dim1 = dim1 + (every - dim1 % every) % every coord0 = np.arange(dim0+offset, da[k].values[0]-offset, step) coord1 = da[k].values coord2 = np.arange(da[k].values[-1]+step, dim1, step) coord = np.hstack((coord0, coord1, coord2)) coords[k] = coord return da.reindex(**coords).fillna(0) da = adjust_bbox(da, {'lat': (2, -1), 'lon': (2, 1)}) # # array([[ 0., 0., 0., 0.], # [ 0., 1., 2., 3.], # [ 4., 5., 6., 7.], # [ 8., 9., 10., 11.], # [12., 13., 14., 15.], # [ 0., 0., 0., 0.]]) # Coordinates: # * lat (lat) float64 5.5 4.5 3.5 2.5 1.5 0.5 # * lon (lon) float64 0.5 1.5 2.5 3.5 da.coarsen(lat=2, lon=2).mean() # # array([[0.25, 1.25], # [6.5 , 8.5 ], # [6.25, 7.25]]) # Coordinates: # * lat (lat) float64 5.0 3.0 1.0 # * lon (lon) float64 1.0 3.0 ``` Now `coarsen` gives the right result. But `adjust_bbox` is rather complicated and specific to this use case (evenly spaced coordinate points...). Do you know of a better/more general way of doing it?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2793/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue