id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1987770706,I_kwDOAMm_X852evlS,8440,nD integer indexing on dask data is very slow,13662783,closed,0,,,2,2023-11-10T14:47:08Z,2023-11-12T04:56:23Z,2023-11-12T04:56:22Z,CONTRIBUTOR,,,,"### What happened?
I ran into a situation where I was indexing with a 2D integer array into some chunked netCDF data. This indexing operation is extremely slow. Using a flat 1D index instead is as fast as expected.
### What did you expect to happen?
I would expect indexing on dask data to be very quick since the work is delayed, and indeed it is so in the 1D case.
However, the 2D case is very slow -- slower than actually _doing the all the work_ with numpy arrays!
### Minimal Complete Verifiable Example
```Python
import dask.array
import numpy as np
import xarray as xr
# %%
da = xr.DataArray(
data=np.random.rand(100, 1_000_000),
dims=(""time"", ""x""),
)
dask_da = xr.DataArray(
data=dask.array.from_array(da.to_numpy(), chunks=(1, 1_000_000)),
dims=(""time"", ""x""),
)
indexer = np.random.randint(0, 1_000_000, size=100_000)
indexer2d = xr.DataArray(
data=indexer.reshape((4, -1)),
dims=(""a"", ""b""),
)
# %%
%timeit da.isel(x=indexer) # 162 ms
%timeit da.isel(x=indexer2d) # 164 ms
%timeit dask_da.isel(x=indexer) # 5.3 ms
%timeit dask_da.isel(x=indexer2d) # 860 ms according to timeit, but 6 to 14 (!) seconds in interactive use
```
### MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:34:57) [MSC v.1936 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Netherlands', '1252')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.2.dev31+ge5d163a8.d20231110
pandas: 2.1.2
numpy: 1.26.0
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.10.1
distributed: 2023.10.1
matplotlib: 3.8.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.3.1
pytest: None
mypy: None
IPython: 8.17.2
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8440/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1173497454,I_kwDOAMm_X85F8iZu,6377,[FEATURE]: Add a replace method,13662783,open,0,,,8,2022-03-18T11:46:37Z,2023-06-25T07:52:46Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem?
If I have a DataArray of values:
```python
da = xr.DataArray([0, 1, 2, 3, 4, 5])
```
And I'd like to replace `to_replace=[1, 3, 5]` by `value=[10, 30, 50]`, there's no method `da.replace(to_replace, value)` to do this.
There's no easy way like pandas (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) to do this.
(Apologies if I've missed related issues, searching for ""replace"" gives many hits as the word is obviously used quite often.)
### Describe the solution you'd like
```python
da = xr.DataArray([0, 1, 2, 3, 4, 5])
replaced = da.replace([1, 3, 5], [10, 30, 50])
print(replaced)
```
```
array([ 0, 10, 2, 30, 4, 50])
Dimensions without coordinates: dim_0
```
I've had a try at a relatively efficient implementation below. I'm wondering whether it's a worthwhile addition to xarray?
### Describe alternatives you've considered
Ignoring issues such as dealing with NaNs, chunks, etc., a simple dict lookup:
```python
def dict_replace(da, to_replace, value):
d = {k: v for k, v in zip(to_replace, value)}
out = np.vectorize(lambda x: d.get(x, x))(da.values)
return da.copy(data=out)
```
Alternatively, leveraging pandas:
```python
def pandas_replace(da, to_replace, value):
df = pd.DataFrame()
df[""values""] = da.values.ravel()
df[""values""].replace(to_replace, value, inplace=True)
return da.copy(data=df[""values""].values.reshape(da.shape))
```
But I also tried my hand at a custom implementation, letting `np.unique` do the heavy lifting:
```python
def custom_replace(da, to_replace, value):
# Use np.unique to create an inverse index
flat = da.values.ravel()
uniques, index = np.unique(flat, return_inverse=True)
replaceable = np.isin(flat, to_replace)
# Create a replacement array in which there is a 1:1 relation between
# uniques and the replacement values, so that we can use the inverse index
# to select replacement values.
valid = np.isin(to_replace, uniques, assume_unique=True)
# Remove to_replace values that are not present in da. If no overlap
# exists between to_replace and the values in da, just return a copy.
if not valid.any():
return da.copy()
to_replace = to_replace[valid]
value = value[valid]
replacement = np.zeros_like(uniques)
replacement[np.searchsorted(uniques, to_replace)] = value
out = flat.copy()
out[replaceable] = replacement[index[replaceable]]
return da.copy(data=out.reshape(da.shape))
```
Such an approach seems like it's consistently the fastest:
```python
da = xr.DataArray(np.random.randint(0, 100, 100_000))
to_replace = np.random.choice(np.arange(100), 10, replace=False)
value = to_replace * 200
test1 = custom_replace(da, to_replace, value)
test2 = pandas_replace(da, to_replace, value)
test3 = dict_replace(da, to_replace, value)
assert test1.equals(test2)
assert test1.equals(test3)
# 6.93 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit custom_replace(da, to_replace, value)
# 9.37 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pandas_replace(da, to_replace, value)
# 26.8 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dict_replace(da, to_replace, value)
```
With the advantage growing the number of values involved:
```python
da = xr.DataArray(np.random.randint(0, 10_000, 100_000))
to_replace = np.random.choice(np.arange(10_000), 10_000, replace=False)
value = to_replace * 200
test1 = custom_replace(da, to_replace, value)
test2 = pandas_replace(da, to_replace, value)
test3 = dict_replace(da, to_replace, value)
assert test1.equals(test2)
assert test1.equals(test3)
# 21.6 ms ± 990 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit custom_replace(da, to_replace, value)
# 3.12 s ± 574 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pandas_replace(da, to_replace, value)
# 42.7 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dict_replace(da, to_replace, value)
```
In my real-life example, with a DataArray of approx 110 000 elements, with 60 000 values to replace, the custom one takes 33 ms, the dict one takes 135 ms, while pandas takes 26 s (!).
### Additional context
In all cases, we need dealing with NaNs, checking the input, etc.:
```python
def replace(da: xr.DataArray, to_replace: Any, value: Any):
from xarray.core.utils import is_scalar
if is_scalar(to_replace):
if not is_scalar(value):
raise TypeError(""if to_replace is scalar, then value must be a scalar"")
if np.isnan(to_replace):
return da.fillna(value)
else:
return da.where(da != to_replace, other=value)
else:
to_replace = np.asarray(to_replace)
if to_replace.ndim != 1:
raise ValueError(""to_replace must be 1D or scalar"")
if is_scalar(value):
value = np.full_like(to_replace, value)
else:
value = np.asarray(value)
if to_replace.shape != value.shape:
raise ValueError(
f""Replacement arrays must match in shape. ""
f""Expecting {to_replace.shape} got {value.shape} ""
)
_, counts = np.unique(to_replace, return_counts=True)
if (counts > 1).any():
raise ValueError(""to_replace contains duplicates"")
# Replace NaN values separately, as they will show up as separate values
# from numpy.unique.
isnan = np.isnan(to_replace)
if isnan.any():
i = np.nonzero(isnan)[0]
da = da.fillna(value[i])
# Use np.unique to create an inverse index
flat = da.values.ravel()
uniques, index = np.unique(flat, return_inverse=True)
replaceable = np.isin(flat, to_replace)
# Create a replacement array in which there is a 1:1 relation between
# uniques and the replacement values, so that we can use the inverse index
# to select replacement values.
valid = np.isin(to_replace, uniques, assume_unique=True)
# Remove to_replace values that are not present in da. If no overlap
# exists between to_replace and the values in da, just return a copy.
if not valid.any():
return da.copy()
to_replace = to_replace[valid]
value = value[valid]
replacement = np.zeros_like(uniques)
replacement[np.searchsorted(uniques, to_replace)] = value
out = flat.copy()
out[replaceable] = replacement[index[replaceable]]
return da.copy(data=out.reshape(da.shape))
```
It think it should be easy to use e.g. let it operate on the numpy arrays so e.g. apply_ufunc will work.
The primary issue is whether values can be sorted; in such a case the dict lookup might be an okay fallback?
I've had a peek at the pandas implementation, but didn't become much wiser.
Anyway, for your consideration! I'd be happy to submit a PR.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6377/reactions"", ""total_count"": 9, ""+1"": 9, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
793245791,MDU6SXNzdWU3OTMyNDU3OTE=,4842,nbytes does not return the true size for sparse variables,13662783,closed,0,,,2,2021-01-25T10:17:56Z,2022-07-22T17:25:33Z,2022-07-22T17:25:33Z,CONTRIBUTOR,,,,"This wasn't entirely surprising to me, but `nbytes` currently doesn't return the right value for sparse data -- at least, I think nbytes should show the actual size in memory?
Since it uses `size` here:
https://github.com/pydata/xarray/blob/a0c71c1508f34345ad7eef244cdbbe224e031c1b/xarray/core/variable.py#L349
Rather than something like `data.nnz`, which of course only exists for sparse arrays...
I'm not sure if there's a sparse flag or something, or whether you'd have to do a typecheck?
**Minimal Complete Verifiable Example**:
```python
import pandas as pd
import numpy as np
import xarray as xr
df = pd.DataFrame()
df[""x""] = np.repeat(np.random.rand(10_000), 10)
df[""y""] = np.repeat(np.random.rand(10_000), 10)
df[""time""] = np.tile(pd.date_range(""2000-01-01"", ""2000-03-10"", freq=""W""), 10_000)
df[""rate""] = 10.0
df = df.set_index([""time"", ""y"", ""x""])
sparse_ds = xr.Dataset.from_dataframe(df, sparse=True)
print(sparse_ds[""rate""].nbytes)
```
```python
8000000000
```
**Anything else we need to know?**:
**Environment**:
Output of xr.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.27.0
distributed: 2.30.1
matplotlib: 3.3.1
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.1.0
IPython: 7.19.0
sphinx: 3.2.1
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4842/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
445745470,MDExOlB1bGxSZXF1ZXN0MjgwMTIwNzIz,2972,ENH: Preserve monotonic descending index order when merging,13662783,open,0,,,4,2019-05-18T19:12:11Z,2022-06-09T14:50:17Z,,CONTRIBUTOR,,0,pydata/xarray/pulls/2972,"* Addresses GH2947
* When indexes were joined in a dataset merge, they would always get sorted in ascending order. This is awkward for geospatial grids, which are nearly always descending in the ""y"" coordinate.
* This also caused an inconsistency: when a merge is called on datasets with identical descending indexes, the resulting index is descending. When a merge is called with non-identical descending indexes, the resulting index is ascending.
* When indexes are mixed ascending and descending, or non-monotonic, the resulting index is still sorted in ascending order.
- [x] Closes #2947
- [x] Tests added
- [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
## Comments
I was doing some work and I kept running into the issue described at #2947, so I had a try at a fix. It was somewhat of a hassle to understand the issue because I kept running into seeming inconsistencies. This is caused by the fact that the joiner doesn't sort with a single index:
```python
def _get_joiner(join):
if join == 'outer':
return functools.partial(functools.reduce, operator.or_)
```
That makes sense, since I'm guessing `pandas.Index.union` isn't get called at all. (I still find the workings of `functools` a little hard to infer.)
I also noticed that an outer join gets called with e.g. an `.isel` operation, even though there's only one index (so there's not really anything to join). However, skipping the join completely in that case makes several tests fail since dimension labels end up missing (I guess the `joiner` call takes care of it).
It's just checking for the specific case now, but it feels like an very specific issue anyway...
The merge behavior is slightly different now, which is reflected in the updated test outcomes in `test_dataset.py`. These tests were turning monotonic decreasing indexes into an increasing index; now the decreasing order is maintained.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2972/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
620468256,MDU6SXNzdWU2MjA0NjgyNTY=,4076,Zarr ZipStore versus DirectoryStore: ZipStore requires .close(),13662783,open,0,,,4,2020-05-18T19:58:21Z,2022-04-28T22:37:48Z,,CONTRIBUTOR,,,,"
I was saving my dataset into a ZipStore -- apparently succesfully -- but then I couldn't reopen them.
The issue appears to be that a regular DirectoryStore behaves a little differently: it doesn't need to be closed, while a ZipStore.
(I'm not sure how this relates to #2586, the remarks there don't appear to be applicable anymore.)
#### MCVE Code Sample
This errors:
```python
import xarray as xr
import zarr
# works as expected
ds = xr.Dataset({'foo': [2,3,4], 'bar': ('x', [1, 2]), 'baz': 3.14})
ds.to_zarr(zarr.DirectoryStore(""test.zarr""))
print(xr.open_zarr(zarr.DirectoryStore(""test.zarr"")))
# error with ValueError ""group not found at path ''
ds.to_zarr(zarr.ZipStore(""test.zip""))
print(xr.open_zarr(zarr.ZipStore(""test.zip"")))
```
Calling close, or using `with` does the trick:
```python
store = zarr.ZipStore(""test2.zip"")
ds.to_zarr(store)
store.close()
print(xr.open_zarr(zarr.ZipStore(""test2.zip"")))
with zarr.ZipStore(""test3.zip"") as store:
ds.to_zarr(store)
print(xr.open_zarr(zarr.ZipStore(""test3.zip"")))
```
#### Expected Output
I think it would be preferable to close the ZipStore in this case. But I might be missing something?
#### Problem Description
Because `to_zarr` works in this situation with a DirectoryStore, it's easy to assume a ZipStore will work similarly. However, I couldn't get it to read my data back in this case.
#### Versions
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.2.dev41+g8415eefa.d20200419
pandas: 0.25.3
numpy: 1.17.5
scipy: 1.3.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.14.0+23.gbea4c9a2
distributed: 2.14.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
pint: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.13.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4076/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
656165548,MDU6SXNzdWU2NTYxNjU1NDg=,4222,Extension proposal: extension for UGRID and unstructured mesh utils,13662783,closed,0,,,5,2020-07-13T21:44:12Z,2022-04-22T18:19:53Z,2022-04-18T15:37:35Z,CONTRIBUTOR,,,,"Disclaimer: I'm not a 100% sure this is necessarily the right place to discuss this. I think some of the things I propose could be broadly useful. At any rate, I appreciate feedback!
EDIT: to clarify, I don't think my original wording was entirely clear: I'm not suggesting adding this to xarray, but I'm thinking about a dataset accessor or a wrapping object in a separate package.
# Introduction
Xarray is a great tool for structured data, but there isn't anything quite like it for unstructured meshes of convex cells. I'm coming from a hydrological background, and I believe many of my colleagues working with unstructured meshes could greatly benefit from xarray.
In terms of data models and formats, there are many unstructured mesh formats, I think UGRID is the most widely shared (outside of specific model formats) and it supports many aspects (e.g. data on faces, nodes, edges). Although [Gridded](https://github.com/NOAA-ORR-ERD/gridded) exists and provides a number of features, I really need something that consumes and produces xarray Datasets (I don't want to go without e.g. dask, or deal with the netcdf4 library myself).
My most immediate need is some utilities for 2D unstructured meshes (y, x), and layered meshes (layer, y, x). This is also what UGRID is mostly used for now, as there don't seem to be many examples of the 3D unstructured mesh definitions in the wild. Sticking to 2D simplies things somewhat, so I'll ignore the third dimension for now -- but I don't see any reason why one couldn't generalize many ideas to 3D.
I currently use xarray a great deal with structured data, in dealing with ""regular"" netCDFs. In my thinking, it's possible to ""mirror"" a few methods which by and large behave similarly to xarray, using the UGRID data model for the mesh topology.
# A simple UGRID example
To clarify what I'm talking about, this is what UGRID looks like in xarray, there's a Dataset with a dummy variable which describes in which variables the mesh topology is stored, in this case for a triangle and a quad.
```python
ds = xr.Dataset()
ds[""mesh2d""] = xr.DataArray(
data=0,
attrs={
""cf_role"": ""mesh_topology"",
""long_name"": ""Topology data of 2D mesh"",
""topology_dimension"": 2,
""node_coordinates"": ""node_x node_y"",
""face_node_connectivity"": ""face_nodes"",
}
)
ds = ds.assign_coords(
node_x=xr.DataArray(
data=np.array([0.0, 10.0, 10.0, 0.0, 20.0, 20.0]),
dims=[""node""],
)
)
ds = ds.assign_coords(
node_y=xr.DataArray(
data=np.array([0.0, 0.0, 10.0, 10.0, 0.0, 10.0]),
dims=[""node""],
)
)
ds[""face_nodes""] = xr.DataArray(
data=np.array([
[0, 1, 2, 3],
[1, 4, 5, -1]
]),
coords={
""face_x"": (""face"", np.array([5.0, 15.0])),
""face_y"": (""face"", np.array([5.0, 5.0])),
},
dims=[""face"", ""nmax_face""],
attrs={
""cf_role"": ""face_node_connectivity"",
""long_name"": ""Vertex nodes of mesh faces (counterclockwise)"",
""start_index"": 0,
""_FillValue"": -1,
}
)
```
It's only the topology that is special cased. A time dependent variable looks the same as it would otherwise, except it would have dimensions `(""time"", ""face"")` rather than `(""time"", ""y"", ""x"")`.
UGRID netCDFs can in principle be read and written without issue. I would prefer to work with standard xarray Datasets like this if possible, because it means there's no need to touch any IO methods, as well as maintaining the full flexibility of xarray.
# Some methods
I'm thinking of a few methods which behave similarly as the xarray methods except that they know about the mesh topology (in x and y) and the dependencies between the different variables. So, you could do:
```Python
import xugrid # name of extension
selection_ds = ds.xugrid.sel(node_x=slice(-30.0, 30.0))
```
And it would slice based on the x coordinate of the nodes, but update all the other variables to produce a consistent mesh (and renumber the `face_nodes` appriopriately). `xugrid.isel` would behave in the same way.
`xugrid.plot` would e.g. perform triangulation automatically (if the mesh isn't triangular already), and create a `matplotlib.tri.Triangulation` object, and so forth.
# Regridding
I think one of the major needs for dealing with unstructured grids is being able to map data from one mesh to another, go from (GIS) raster data to unstructured, or go from unstructured model output to GIS raster data. For my use case, I often require area weighted methods, which requires computing overlap between source and destination meshes. I haven't really found a suitable package, e.g. [xemsf](https://xesmf.readthedocs.io/en/latest/why.html) looks very useful, but doesn't have quite the methods I'd want. Also troubling is that I can't really get it installed on Windows.
Existing GIS routines for vector data are far too slow (they have to take too many special cases into account, I think), see some benchmarks [here](https://github.com/Huite/clipping-benchmarks).
I think [numba](https://numba.pydata.org/) provides an interesting opportunity here, not just for ease of development and distribution, but also because of JIT-compilation, the user can provide their own regridding methods. E.g.
```python
import numba as nb
import numpy as np
def median(values, weights):
return np.nanpercentile(values, 50)
class Regridder:
def __init__(self, source, destination):
# compute and store weights
# ...
# Use a closure to avoid numba's function overhead:
# https://numba.pydata.org/numba-doc/latest/user/faq.html#can-i-pass-a-function-as-an-argument-to-a-jitted-function
def _make_regrid(self, method):
jitted_func = nb.njit(method)
def regrid(src_indices, dst_indices, data, weights):
out = np.full(dst_indices.size, np.nan)
for src_index, dst_index in zip(src_indices, dst_indices): # or something
# accumulate values
out[dst_index] = jitted_func(values)
return out
return regrid
def regrid(source_data, method):
self._regrid = self._make_regrid(method)
# Cache the method in the object or something to avoid repeat compilation
# provide default methods (e.g. min or max) as string or enumerators
return self._regrid(self.src_indices, self.dst_indices, source_data, self.weights)
```
The best way to find overlapping cells seems via a cell tree, as implemented [here](https://github.com/NOAA-ORR-ERD/cell_tree2d), and requires searching for bounding boxes rather than points. I've ported the [CellTree2D to numba]( https://github.com/Huite/xugrid/blob/master/xugrid/geometry/cell_tree.py), and added a bounding box search.
The serial version is a little bit slower than C++ for point searches, but with numba's `prange` (and enough processors) searches are a few times faster. (In retrospect, I could've just re-used the C++ tree rather than porting it, but numba does have some benefits...)
Interpolation on an unstructured grids can probably be handled by [pyinterp](https://github.com/CNES/pangeo-pyinterp), and can be made available via the regridder. (Scipy interpolate takes too much memory in my experience for large meshes.)
# API proposal
```python
class Regridder:
# Basically same API as xemsf
def __init__(self, source: xr.Dataset, destination: xr.Dataset) -> None:
pass
def regrid(self, source_data: xr.DataArray, method: Union[str, Callable]) -> xr.Dataset:
pass
xugrid.triangulate(self) -> xr.Dataset:
xugrid.rasterize(self, resolution: Mapping) -> xr.Dataset:
xugrid.reproject(self, dst_crs, src_crs) -> xr.Dataset: # just offload all coordinates to pyproj?
xugrid.sel(self, indexers: Mapping) -> xr.Dataset:
xugrid.isel(self, indexers: Mapping) -> xr.Dataset:
xugrid.plot(self) -> None: # calls triangulate if needed
xugrid.plot.imshow(self) -> None: # maybe calls rasterize first, provides a quicker peek
xugrid.slice_vertical(self, linestring) -> xr.Dataset: # traces one or more rays through the mesh
xugrid.check_mesh(self) -> None: # check for convexity, all indices present, etc.
xugrid.connectivity_matrix(self) -> scipy.sparse?
xugrid.reverse_cuthill_mckee(self) -> xr.Dataset: # use scipy.sparse.csgraph
xugrid.binary_erosion(self, iterations: int) -> xr.DataArray:
xugrid.binary_dilation(self, iterations: int) -> xr.DataArray:
xugrid.meshfix(self) # call pymeshfix
xugrid.to_pyvista(self) -> pyvista.UnstructuredGrid: # create a pyvista (vtk) mesh
```
Most of these methods require dealing with numpy arrays (or numpy arrays representing sparse arrays), so I think numba is very suitable for the algorithms (if they don't already exist).
For most methods, 3D meshes would be possible too. It requires some changes in e.g. the CellTree, and plotting would first require a `slice_horizontal` or something to get a plan view image.
Some methods might not be very needed, I'm basing it off of my impression what I currently use xarray for in combination with some other tools that assume structured data.
# Resources
* Cell tree https://github.com/NOAA-ORR-ERD/cell_tree2d
* Numba KD tree: https://github.com/jackd/numba-neighbors
* Interpolation: https://github.com/CNES/pangeo-pyinterp
* UGRID conventions: https://ugrid-conventions.github.io/ugrid-conventions/
* Gridded: https://github.com/NOAA-ORR-ERD/gridded
* Triangulating polygons (could be useful for vector data -> mesh): https://pypi.org/project/mapbox-earcut/
* Pymeshfix: https://github.com/pyvista/pymeshfix
(Have to check for non-permissive / copyleft licenses...)
# Thoughts?
Does this sound useful? Does it belong somewhere else / does it already exist?
(I'm aware a number e.g. the tree structures are available in vtk as well, but the Python API unfortunately doesn't expose them; having a numba version also makes changes a lot easier.)
Something that popped up:
Some of the algorithms benefit from some persistent data structures (e.g. a cell tree or connectivity matrix), can these be maintained via an xarray-extension?
I'm assuming mesh topologies that fit within memory, that seems challenging enough for now...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4222/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
386596872,MDU6SXNzdWUzODY1OTY4NzI=,2587,"DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0",13662783,open,0,,,3,2018-12-02T20:34:36Z,2022-04-18T16:06:12Z,,CONTRIBUTOR,,,,"#### Code Sample
```python
import xarray as xr
import numpy as np
from datetime import datetime
time = [np.datetime64(datetime.strptime(""10000101"", ""%Y%m%d""))]
print(time[0])
print(np.dtype(time[0]))
da = xr.DataArray(time, (""time"",), {""time"":time})
print(da)
```
Results in:
```
1000-01-01T00:00:00.000000
datetime64[us]
array(['2169-02-08T23:09:07.419103232'], dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 2169-02-08T23:09:07.419103232
```
#### Problem description
I was happy to see `cftime` as default in the release notes for 0.11.0:
> Xarray will now always use `cftime.datetime` objects, rather than by default trying to coerce them into `np.datetime64[ns]` objects. A `CFTimeIndex` will be used for indexing along time coordinates in these cases.
However, it seems that the DataArray constructor does not use `cftime` (yet?), and coerces to `np.datetime64[ns]` here:
https://github.com/pydata/xarray/blob/0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e/xarray/core/variable.py#L183-L189
#### Expected Output
I think I'd expect `cftime.datetime` in this case as well. Some coercion happens anyway as pandas timestamps are turned into `np.datetime64[ns]`.
(But perhaps this was already on your radar, and am I just a little too eager!)
#### Output of ``xr.show_versions()``
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
xarray: 0.11.0
pandas: 0.23.3
numpy: 1.15.3
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.0
PseudonetCDF: None
rasterio: 1.0.0
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.2
distributed: 1.23.2
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: 3.6.3
IPython: 6.4.0
sphinx: 1.7.5
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2587/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
601364609,MDExOlB1bGxSZXF1ZXN0NDA0NjM0Mjk0,3979,full_like: error on non-scalar fill_value,13662783,closed,0,,,2,2020-04-16T19:18:50Z,2020-04-24T07:15:49Z,2020-04-24T07:15:44Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3979,"
- [x] Closes #3977
- [x] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
@dcherian's suggestion in #3977 seemed straightforward enough for me to have a try.
Two thoughts:
* does the `np.isscalar` check belong in the outer function, or the inner? The inner function is only called by the outer one.
* Bugfix or arguably a breaking change?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3979/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
601224688,MDU6SXNzdWU2MDEyMjQ2ODg=,3977,xr.full_like (often) fails when other is chunked and fill_value is non-scalar,13662783,closed,0,,,1,2020-04-16T16:23:59Z,2020-04-24T07:15:44Z,2020-04-24T07:15:44Z,CONTRIBUTOR,,,,"I've been running into some issues when using `xr.full_like`, when my `other.data` is a chunked dask array, and the `fill_value` is a numpy array.
Now, I just checked, ``full_like`` mentions only scalar in the signature. However, this is a very convenient way to get all the coordinates and dimensions attached to an array like this, so it feels like desirable functionality. And as I mention below, both numpy and dask function similary, taking much more than just scalars.
https://xarray.pydata.org/en/stable/generated/xarray.full_like.html
#### MCVE Code Sample
```python
x = [1, 2, 3, 4]
y = [1, 2, 3]
da1 = xr.DataArray(dask.array.ones((3, 4), chunks=(1, 4)), {""y"": y, ""x"": x}, (""y"", ""x""))
da2 = xr.full_like(da1, np.ones((3, 4)))
print(da2.values)
```
This results in an error:
`ValueError: could not broadcast input array from shape (1,3) into shape (1,4)`
#### Expected Output
Expected is a DataArray with the dimensions and coords of `other`, and the numpy array of `fill_value` as its data.
#### Problem Description
The issue lies here: https://github.com/pydata/xarray/blob/2c77eb531b6689f9f1d2adbde0d8bf852f1f7362/xarray/core/common.py#L1420-L1436
Calling `dask.array.full` with the given number of chunks results in it trying to to apply the `fill_value` for every individual chunk.
As one would expect, if I set `fill_value` to the size of a single chunk it doesn't error:
```python
da2 = xr.full_like(da1, np.ones((1, 4)))
print(da2.values)
```
It does fail on a similarly chunked dask array (since it's applying it for every chunk):
```python
da2 = xr.full_like(da1, dask.array.ones((3, 4)))
print(da2.values)
```
The most obvious solution would be to force it down the `np.full_like` route, since all the values already exist in memory anyway. So maybe another type check does the trick. However, `full()` accepts quite a variety of arguments for the fill value (scalars, numpy arrays, lists, tuples, ranges). The dask docs mention only a scalar in the signature for ``dask.array.full``:
https://docs.dask.org/en/latest/array-api.html#dask.array.full
As does numpy.full:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html
However, in all cases, they still broadcast automatically...
```python
a = np.full((2, 2), [1, 2]
>>> array([[1, 2],
[1, 2]])
```
So kind of undefined behavior of a blocked `full`?
#### Versions
Output of `xr.show_versions()`
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.1
pandas: 0.25.3
numpy: 1.17.5
scipy: 1.3.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.9.2
distributed: 2.10.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.13.0
sphinx: 2.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3977/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
545762025,MDU6SXNzdWU1NDU3NjIwMjU=,3664,Regression 0.14.0 -> 0.14.1: __setattr__ no longer updates matching index,13662783,closed,0,,,2,2020-01-06T14:42:50Z,2020-01-06T16:03:16Z,2020-01-06T15:37:17Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
This came up in the following (simplified) case:
```python
import numpy as np
import xarray as xr
x = [1, 2, 3, 4]
data = [1., 2., 3. , 4.]
dims = [""x""]
coords = {""x"": x}
da = xr.DataArray(data, coords, dims)
da1 = da.sel(x=[1, 3])
da2 = da.sel(x=[2, 4])
da2[""x""].values = da1[""x""].values
da3 = da2 - da1
print(da3)
```
In 0.14.1, this results in an empty DataArray `da3`:
```python
array([], dtype=float64)
Coordinates:
* x (x) int64
```
0.14.0 (and before), no issues:
```python
array([1., 1.])
Coordinates:
* x (x) int32 1 3
```
Interestingly, the values of `da[""x""]` are okay, `print(da2[""x""]"")` gives:
`array([1, 3])`
However, the index has not been updated, `print(da2.indexes[""x""])`:
`Int64Index([2, 4], dtype='int64', name='x')`
#### Expected Output
I'm expecting output as in 0.14.0 and before. I've briefly stepped through the code for `da2[""x""].values = da1[""x""].values`; I don't see any changes there between 0.14.0 and 0.14.1, so I'm guessing it's due to some change in the indexes. Apparently the index no longer uses the same array?
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.2
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.1
cfgrib: None
iris: None
bottleneck: None
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.6.0.post20191030
pip: 19.3.1
conda: None
pytest: None
IPython: 7.9.0
sphinx: 2.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3664/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
354298235,MDU6SXNzdWUzNTQyOTgyMzU=,2383,groupby().apply() on variable with NaNs raises IndexError,13662783,closed,0,,,1,2018-08-27T12:27:06Z,2019-10-28T23:46:41Z,2019-10-28T23:46:41Z,CONTRIBUTOR,,,,"#### Code Sample
```python
import xarray as xr
import numpy as np
def standardize(x):
return (x - x.mean()) / x.std()
ds = xr.Dataset()
ds[""variable""] = xr.DataArray(np.random.rand(4,3,5),
{""lat"":np.arange(4), ""lon"":np.arange(3), ""time"":np.arange(5)},
(""lat"", ""lon"", ""time""),
)
ds[""id""] = xr.DataArray(np.arange(12.0).reshape((4,3)),
{""lat"": np.arange(4), ""lon"":np.arange(3)},
(""lat"", ""lon""),
)
ds[""id""].values[0,0] = np.nan
ds.groupby(""id"").apply(standardize)
```
#### Problem description
This results in an IndexError. This is mildly confusing, it took me a little while to figure out the NaN's were to blame. I'm guessing the NaN doesn't get filtered out everywhere.
The traceback:
```
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in ()
15 ds[""id""].values[0,0] = np.nan
16
---> 17 ds.groupby(""id"").apply(standardize)
C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in apply(self, func, **kwargs)
607 kwargs.pop('shortcut', None) # ignore shortcut if set (for now)
608 applied = (func(ds, **kwargs) for ds in self._iter_grouped())
--> 609 return self._combine(applied)
610
611 def _combine(self, applied):
C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _combine(self, applied)
614 coord, dim, positions = self._infer_concat_args(applied_example)
615 combined = concat(applied, dim)
--> 616 combined = _maybe_reorder(combined, dim, positions)
617 if coord is not None:
618 combined[coord.name] = coord
C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _maybe_reorder(xarray_obj, dim, positions)
428
429 def _maybe_reorder(xarray_obj, dim, positions):
--> 430 order = _inverse_permutation_indices(positions)
431
432 if order is None:
C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _inverse_permutation_indices(positions)
109 positions = [np.arange(sl.start, sl.stop, sl.step) for sl in positions]
110
--> 111 indices = nputils.inverse_permutation(np.concatenate(positions))
112 return indices
113
C:\Miniconda3\envs\main\lib\site-packages\xarray\core\nputils.py in inverse_permutation(indices)
52 # use intp instead of int64 because of windows :(
53 inverse_permutation = np.empty(len(indices), dtype=np.intp)
---> 54 inverse_permutation[indices] = np.arange(len(indices), dtype=np.intp)
55 return inverse_permutation
56
IndexError: index 11 is out of bounds for axis 0 with size 11
```
#### Expected Output
My assumption was that it would throw out the values that fall within the NaN group, like`pandas`:
```python
import pandas as pd
import numpy as np
df = pd.DataFrame()
df[""var""] = np.random.rand(10)
df[""id""] = np.arange(10)
df[""id""].iloc[0:2] = np.nan
df.groupby(""id"").mean()
```
Out:
```python
var
id
2.0 0.565366
3.0 0.744443
4.0 0.190983
5.0 0.196922
6.0 0.377282
7.0 0.141419
8.0 0.957526
9.0 0.207360
```
#### Output of ``xr.show_versions()``
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
xarray: 0.10.8
pandas: 0.23.3
numpy: 1.15.0
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.2
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 3.7.1
IPython: 6.4.0
sphinx: 1.7.5
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2383/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
510505541,MDExOlB1bGxSZXF1ZXN0MzMwODY3ODI0,3428,Address multiplication DeprecationWarning in rasterio backend,13662783,closed,0,,,2,2019-10-22T08:32:25Z,2019-10-22T18:45:32Z,2019-10-22T18:45:24Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3428,"Very minor change to address this DeprecationWarning:
```
xarray\backends\rasterio_.py:260: DeprecationWarning: Right multiplication will be prohibited in version 3.0
x, _ = (np.arange(nx) + 0.5, np.zeros(nx) + 0.5) * riods.transform
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3428/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
441341340,MDU6SXNzdWU0NDEzNDEzNDA=,2947,xr.merge always sorts indexes ascending,13662783,open,0,,,2,2019-05-07T17:06:06Z,2019-05-07T21:07:26Z,,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible
```python
import xarray as xr
import numpy as np
nrow, ncol = (4, 5)
dx, dy = (1.0, -1.0)
xmins = (0.0, 3.0, 3.0, 0.0)
xmaxs = (5.0, 8.0, 8.0, 5.0)
ymins = (0.0, 2.0, 0.0, 2.0)
ymaxs = (4.0, 6.0, 4.0, 6.0)
data = np.ones((nrow, ncol), dtype=np.float64)
das = []
for xmin, xmax, ymin, ymax in zip(xmins, xmaxs, ymins, ymaxs):
kwargs = dict(
name=""example"",
dims=(""y"", ""x""),
coords={""y"": np.arange(ymax, ymin, dy),
""x"": np.arange(xmin, xmax, dx)},
)
das.append(xr.DataArray(data, **kwargs))
xr.merge(das)
# This won't flip the coordinate:
xr.merge([das[0]))
```
#### Problem description
Let's say I have a number of geospatial grids that I'd like to merge (for example, loaded with `xr.open_rasterio`). To quote [https://www.perrygeo.com/python-affine-transforms.html](url)
> The typical geospatial coordinate reference system is defined on a cartesian plane with the 0,0 origin in the bottom left and X and Y increasing as you go up and to the right. But raster data, coming from its image processing origins, uses a different referencing system to access pixels. We refer to rows and columns with the 0,0 origin in the upper left and rows increase and you move down while the columns increase as you go right. Still a cartesian plane but not the same one.
`xr.merge` will alway return the result with ascending coordinates, which creates some issues / confusion later on if you try to write it back to a GDAL format, for example (I've been scratching my head for some time looking at upside-down .tifs).
#### Expected Output
I think the expected output for these geospatial grids is that; if you provide only DataArrays with positive dx, negative dy; that the merged result comes out with a positive dx and a negative dy as well.
When the DataArrays to merge are mixed in coordinate direction (some with ascending, some with descending coordinate values), defaulting to an ascending sort seems sensible.
#### A suggestion
I saw that the sort is occurring [here, in pandas](https://github.com/pandas-dev/pandas/blob/2bbc0c2c198374546408cb15fff447c1e306f99f/pandas/core/indexes/base.py#L2260-L2265); and that there's a `is_monotonic_decreasing` property in [pandas.core.indexes.base.Index](https://github.com/pandas-dev/pandas/blob/2bbc0c2c198374546408cb15fff447c1e306f99f/pandas/core/indexes/base.py#L1601)
I think this could work (it solves my issue at least), in [xarray.core.alignment](https://github.com/pydata/xarray/blob/5aaa6547cd14a713f89dfc7c22643d86fce87916/xarray/core/alignment.py#L125)
```python
index = joiner(matching_indexes)
if all(
(matching_index.is_monotonic_decreasing
for matching_index in matching_indexes)
):
index = index[::-1]
joined_indexes[dim] = index
```
But I lack the knowledge to say whether this plays nice in all cases. And does `index[::-1]` return a view or a copy? (And does it matter?)
For reference this is what it looks like now:
```python
if (any(not matching_indexes[0].equals(other)
for other in matching_indexes[1:]) or
dim in unlabeled_dim_sizes):
if join == 'exact':
raise ValueError(
'indexes along dimension {!r} are not equal'
.format(dim))
index = joiner(matching_indexes)
joined_indexes[dim] = index
else:
index = matching_indexes[0]
```
It's also worth highlighting that the `else` branch causes, arguably, some inconsistency. If the indexes are equal, no reversion occurs.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2947/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
365854029,MDU6SXNzdWUzNjU4NTQwMjk=,2454,"Regression from 0.10.8 to 0.10.9, ""shape mismatch"" IndexError in reindex_like() after open_rasterio().squeeze()",13662783,closed,0,,,1,2018-10-02T11:28:16Z,2018-10-08T18:17:02Z,2018-10-08T18:17:02Z,CONTRIBUTOR,,,,"#### Code Sample
```python
import xarray as xr
import numpy as np
import rasterio
from affine import Affine
def write_tif(path, epsg):
nrow = 5
ncol = 8
values = 10.0 * np.random.rand(nrow, ncol)
profile = dict()
profile[""crs""] = rasterio.crs.CRS.from_epsg(epsg)
profile[""transform""] = Affine.translation(0.0, 0.0)
profile[""driver""] = ""GTiff""
profile[""height""] = nrow
profile[""width""] = ncol
profile[""count""] = 1
profile[""dtype""] = np.float64
with rasterio.Env():
with rasterio.open(path, ""w"", **profile) as dst:
dst.write(values, 1)
write_tif(""basic.tif"", epsg=28992)
da = xr.open_rasterio(""basic.tif"").squeeze(""band"", drop=True)
likevalues = np.empty((10, 16))
likecoords = {""y"": np.arange(0.25, 5.0, 0.5), ""x"": np.arange(0.25, 8.0, 0.5)}
likedims = (""y"", ""x"")
like = xr.DataArray(likevalues, likecoords, likedims)
newda = da.reindex_like(like)
```
#### Problem description
The code above executes without issues in xarray version 0.10.8. However, it results in a ""shape-mismatch"" error in 0.10.9:
```
Traceback (most recent call last):
File """", line 8, in
newda = da.reindex_like(like)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataarray.py"", line 919, in reindex_like
**indexers)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataarray.py"", line 969, in reindex
indexers=indexers, method=method, tolerance=tolerance, copy=copy)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataset.py"", line 1924, in reindex
tolerance, copy=copy)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\alignment.py"", line 377, in reindex_variables
new_var = var._getitem_with_mask(key)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\variable.py"", line 655, in _getitem_with_mask
data = duck_array_ops.where(mask, fill_value, data)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py"", line 183, in where
return _where(condition, *as_shared_dtype([x, y]))
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py"", line 115, in as_shared_dtype
arrays = [asarray(x) for x in scalars_or_arrays]
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py"", line 115, in
arrays = [asarray(x) for x in scalars_or_arrays]
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py"", line 110, in asarray
return data if isinstance(data, dask_array_type) else np.asarray(data)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray
return array(a, dtype, copy=False, order=order)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py"", line 624, in __array__
self._ensure_cached()
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py"", line 621, in _ensure_cached
self.array = NumpyIndexingAdapter(np.asarray(self.array))
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray
return array(a, dtype, copy=False, order=order)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py"", line 602, in __array__
return np.asarray(self.array, dtype=dtype)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py"", line 492, in asarray
return array(a, dtype, copy=False, order=order)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py"", line 508, in __array__
return np.asarray(array[self.key], dtype=None)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\backends\rasterio_.py"", line 119, in __getitem__
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py"", line 776, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File ""d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\backends\rasterio_.py"", line 115, in _getitem
return out[np_inds]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (10,) (16,)
```
I only get it to occur when loading a raster using `open_rasterio()`. Defining `da` directly does not lead to issues, using either numpy or dask.array.
```Python
values = 10.0 * np.random.rand(1, 5, 8)
coords = {""band"": [1.0], ""y"": np.arange(0.5, 5.0, 1.0), ""x"": np.arange(0.5, 8.0, 1.0)}
dims = (""band"", ""y"", ""x"")
da = xr.DataArray(values, coords, dims)
da = da.squeeze(""band"", drop=True)
```
```Python
import dask.array
values = 10.0 * dask.array.random.random((1, 5, 8), (1, 5, 8))
coords = {""band"": [1.0], ""y"": np.arange(0.5, 5.0, 1.0), ""x"": np.arange(0.5, 8.0, 1.0)}
dims = (""band"", ""y"", ""x"")
da = xr.DataArray(values, coords, dims)
da = da.squeeze(""band"", drop=True)
```
The ""shape mismatch"" makes me think that maybe the `squeeze()` isn't executed properly or in time. This works:
```Python
write_tif(""basic.tif"", epsg=28992)
da = xr.open_rasterio(""basic.tif"")
likevalues = np.empty((1, 10, 16))
likecoords = {""band"":[1], ""y"": np.arange(0.25, 5.0, 0.5), ""x"": np.arange(0.25, 8.0, 0.5)}
likedims = (""band"", ""y"", ""x"")
like = xr.DataArray(likevalues, likecoords, likedims)
newda = da.reindex_like(like)
```
So it looks to me like some lazy evaluation is going awry when `open_rasterio` is involved, in this case.
Anything which forces a `compute()` before `reindex_like` in this case seems to make the problem go away, like calling `da` in the console, `print(da)` after opening the .tif, etc.
```Python
write_tif(""basic.tif"", epsg=28992)
da = xr.open_rasterio(""basic.tif"").compute().squeeze(""band"", drop=True)
likevalues = np.empty((10, 16))
likecoords = {""y"": np.arange(0.25, 5.0, 0.5), ""x"": np.arange(0.25, 8.0, 0.5)}
likedims = (""y"", ""x"")
like = xr.DataArray(likevalues, likecoords, likedims)
newda = da.reindex_like(like)
```
Interestingly, I can't easily inspect it, because inspecting makes the problem go away!
#### Expected Output
A succesfully reindexed DataArray.
#### Output of ``xr.show_versions()``
```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
xarray: 0.10.9
pandas: 0.23.3
numpy: 1.14.5
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.0
PseudonetCDF: None
rasterio: 1.0.0
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 3.6.3
IPython: 6.4.0
sphinx: 1.7.5
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2454/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue