id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1330149534,I_kwDOAMm_X85PSHie,6881,Alignment of dataset with MultiIndex fails after applying xr.concat ,19226431,closed,0,,,0,2022-08-05T16:42:05Z,2022-08-25T11:15:55Z,2022-08-25T11:15:55Z,CONTRIBUTOR,,,,"### What happened? After applying the `concat` function to a dataset with a Multiindex, a lot of functions related to indexing are broken. For example, it is not possible to apply `reindex_like` to itself anymore. The error is raised in the alignment module. It seems that the function `find_matching_indexes` does not find indexes that belong to the same dimension. ### What did you expect to happen? I expected the alignment to be functional and that these basic functions work. ### Minimal Complete Verifiable Example ```Python import xarray as xr import pandas as pd index = pd.MultiIndex.from_product([[1,2], ['a', 'b']], names=('level1', 'level2')) index.name = 'dim' var = xr.DataArray(1, coords=[index]) ds = xr.Dataset({""var"":var}) new = xr.concat([ds], dim='newdim') xr.Dataset(new) # breaks new.reindex_like(new) # breaks ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python Traceback (most recent call last): File ""/tmp/ipykernel_407170/4030736219.py"", line 11, in xr.Dataset(new) # breaks File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/dataset.py"", line 599, in __init__ variables, coord_names, dims, indexes, _ = merge_data_and_coords( File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/merge.py"", line 575, in merge_data_and_coords return merge_core( File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/merge.py"", line 752, in merge_core aligned = deep_align( File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/alignment.py"", line 827, in deep_align aligned = align( File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/alignment.py"", line 764, in align aligner.align() File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/alignment.py"", line 550, in align self.assert_no_index_conflict() File ""/home/fabian/.miniconda3/lib/python3.10/site-packages/xarray/core/alignment.py"", line 319, in assert_no_index_conflict raise ValueError( ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'dim' (2 conflicting indexes) Conflicting indexes may occur when - they relate to different sets of coordinate and/or dimension names - they don't have the same type - they may be used to reindex data along common dimensions ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.2 numpy: 1.21.6 scipy: 1.8.1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.6.1 distributed: 2022.6.1 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.3.0 cupy: None pint: None sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 61.2.0 pip: 22.1.2 conda: 4.13.0 pytest: 7.1.2 IPython: 7.33.0 sphinx: 5.0.2 /home/fabian/.miniconda3/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn(""Setuptools is replacing distutils."")
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6881/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 784042442,MDU6SXNzdWU3ODQwNDI0NDI=,4796,Use apply_ufunc for unary funcs,19226431,open,0,,,3,2021-01-12T08:56:03Z,2022-04-18T16:31:02Z,,CONTRIBUTOR,,,," DataArray.clip() of a chunked array returns an assertion error as soon as the argument takes an chunked array. With non-chunked arrays every thing works as intended. ```python x = xr.DataArray(np.random.uniform(size=[100, 100])).chunk(10) x.clip(max=x) ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-60-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.16.2 pandas: 1.2.0 numpy: 1.19.5 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.0 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2020.12.0 distributed: 2020.12.0 matplotlib: 3.1.3 cartopy: 0.18.0 seaborn: 0.11.0 numbagg: None pint: None setuptools: 49.2.1.post20200807 pip: 20.2.1 conda: 4.8.3 pytest: 6.0.1 IPython: 7.11.1 sphinx: 3.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4796/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 830040696,MDU6SXNzdWU4MzAwNDA2OTY=,5024,xr.DataArray.sum() converts string objects into unicode,19226431,open,0,,,0,2021-03-12T11:47:06Z,2022-04-09T01:40:09Z,,CONTRIBUTOR,,,," **What happened**: When summing over all axes of a DataArray with strings of dtype `object`, the result is a one-size `unicode` DataArray. **What you expected to happen**: I expected the summation would preserve the dtype, meaning the one-size DataArray would be of dtype `object` **Minimal Complete Verifiable Example**: ``` ds = xr.DataArray('a', [range(3), range(3)]).astype(object) ds.sum() ``` Output ``` array('aaaaaaaaa', dtype=' array(['aaa', 'aaa', 'aaa'], dtype=object) Coordinates: * dim_1 (dim_1) int64 0 1 2 ``` **Anything else we need to know?**: The problem becomes relevant as soon as dask is used in the workflow. Dask expects the aggregated DataArray to be of dtype `object` which will likely lead to errors in the operations to follow. Probably the behavior comes from creating a new DataArray after the reduction with `np.sum()` (which itself leads results in a pure python string). **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.7.4 h5py: 3.1.0 Nio: None zarr: 2.3.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 52.0.0.post20210125 pip: 21.0 conda: 4.9.2 pytest: 6.2.2 IPython: 7.19.0 sphinx: 3.4.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5024/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1052736383,I_kwDOAMm_X84-v3t_,5983,preserve chunked data when creating DataArray from itself ,19226431,closed,0,,,4,2021-11-13T18:00:24Z,2022-01-13T17:02:47Z,2022-01-13T17:02:47Z,CONTRIBUTOR,,,,"**What happened**: When creating a new DataArray from a DataArray with chunked data, the underlying dask array is converted to a numpy array. **What you expected to happen**: I expected the underlying dask array to be preseved when creating a new DataArray instance. **Minimal Complete Verifiable Example**: ```python import xarray as xr import numpy as np from dask import array d = np.ones((10, 10)) x = array.from_array(d, chunks=5) da = xr.DataArray(x) # this is chunked xr.DataArray(da) # this is not chunked anymore ``` **Anything else we need to know?**: **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.11.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.20.3 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.10.1 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.6 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.09.1 distributed: 2021.09.1 matplotlib: 3.4.3 cartopy: 0.19.0.post1 seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.0.4 pip: 21.2.4 conda: 4.10.3 pytest: 6.2.5 IPython: 7.27.0 sphinx: 4.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5983/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue