id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1379372915,I_kwDOAMm_X85SN49z,7059,pandas.errors.InvalidIndexError raised when running computation in parallel using dask,691772,open,0,,,8,2022-09-20T12:52:16Z,2024-03-02T16:43:15Z,,CONTRIBUTOR,,,,"### What happened?

I'm doing a computation using chunks and `map_blocks()` to run things in parallel. At some point a `pandas.errors.InvalidIndexError` is raised. When using dask's synchronous scheduler, everything works fine. I think `pandas.core.indexes.base.Index` is not thread-safe. At least this seems to be the place of the race condition. See further tests below.

(This issue was initially discussed in #6816, but the ticket was closed, because I couldn't reproduce the problem any longer. Now it seems to be reproducible in every run, so it is time for a proper bug report, which is this ticket here.)

### What did you expect to happen?

Dask schedulers `single-threaded` and `threads` should have the same result.

### Minimal Complete Verifiable Example 1

*Edit:* I've managed to reduce the verifiable example, see example 2 below.

```Python
# I wasn't able to reproduce the issue with a smaller code example, so I provide all my code and my test data. This should make it possible to reproduce the issue in less than a minute.

# Requirements:
#   - git
#   - mamba, see https://github.com/mamba-org/mamba

git clone https://github.com/lumbric/reproduce_invalidindexerror.git
cd reproduce_invalidindexerror

mamba env create -f env.yml

# alternatively run the following, will install latest versions from conda-forge:
# conda create -n reproduce_invalidindexerror
# conda activate reproduce_invalidindexerror
# mamba install -c conda-forge python=3.8 matplotlib pytest-cov dask openpyxl pytest pip xarray netcdf4 jupyter pandas scipy flake8 dvc pre-commit pyarrow statsmodels rasterio scikit-learn pytest-watch pdbpp black seaborn

conda activate reproduce_invalidindexerror

dvc repro checks_simulation
```

### Minimal Complete Verifiable Example 2

```Python
import numpy as np
import pandas as pd
import xarray as xr

from multiprocessing import Lock
from dask.diagnostics import ProgressBar


# Workaround for xarray#6816: Parallel execution causes often an InvalidIndexError
# https://github.com/pydata/xarray/issues/6816#issuecomment-1243864752
# import dask
# dask.config.set(scheduler=""single-threaded"")


def generate_netcdf_files():
    fnames = [f""{i:02d}.nc"" for i in range(21)]
    for i, fname in enumerate(fnames):
        xr.DataArray(
            np.ones((3879, 48)),
            dims=(""locations"", ""time""),
            coords={
                ""time"": pd.date_range(f""{2000 + i}-01-01"", periods=48, freq=""D""),
                ""locations"": np.arange(3879),
            },
        ).to_netcdf(fname)
    return fnames


def compute(locations, data):
    def resample_annually(data):
        return data.sortby(""time"").resample(time=""1A"", label=""left"", loffset=""1D"").mean(dim=""time"")

    def worker(data):
        locations_chunk = locations.sel(locations=data.locations)
        out_raw = data * locations_chunk
        out = resample_annually(out_raw)
        return out

    template = resample_annually(data)

    out = xr.map_blocks(
        lambda data: worker(data).compute().chunk({""time"": None}),
        data,
        template=template,
    )

    return out


def main():
    fnames = generate_netcdf_files()

    locations = xr.DataArray(
        np.ones(3879),
        dims=""locations"",
        coords={""locations"": np.arange(3879)},
    )

    data = xr.open_mfdataset(
        fnames,
        combine=""by_coords"",
        chunks={""locations"": 4000, ""time"": None},
        # suggested as solution in
        # lock=Lock(),
    ).__xarray_dataarray_variable__

    out = compute(locations, data)

    with ProgressBar():
        out = out.compute()


if __name__ == ""__main__"":
    main()
```

### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

This is the traceback of ""Minimal Complete Verifiable Example 1"".

```Python
Traceback (most recent call last):
  File ""scripts/calc_p_out_model.py"", line 61, in <module>
    main()
  File ""scripts/calc_p_out_model.py"", line 31, in main
    calc_power(name=""p_out_model"", compute_func=compute_func)
  File ""/tmp/reproduce_invalidindexerror/src/wind_power.py"", line 136, in calc_power
    power = power.compute()
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 993, in compute
    return new.load(**kwargs)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 967, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/dataset.py"", line 733, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/base.py"", line 600, in compute
    results = schedule(dsk, keys, **kwargs)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/threaded.py"", line 89, in get
    results = get_async(
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/local.py"", line 511, in get_async
    raise_exception(exc, tb)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/local.py"", line 319, in reraise
    raise exc
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/local.py"", line 224, in execute_task
    result = _execute_task(task, data)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/core.py"", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/core.py"", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/dask/core.py"", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/parallel.py"", line 285, in _wrapper
    result = func(*converted_args, **kwargs)
  File ""/tmp/reproduce_invalidindexerror/src/wind_power.py"", line 100, in <lambda>
    lambda wind_speeds: worker(wind_speeds).compute().chunk({""time"": None}),
  File ""/tmp/reproduce_invalidindexerror/src/wind_power.py"", line 50, in worker
    specific_power_chunk = specific_power.sel(turbines=wind_speeds.turbines)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 1420, in sel
    ds = self._to_temp_dataset().sel(
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/dataset.py"", line 2533, in sel
    query_results = map_index_queries(
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/indexing.py"", line 183, in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/indexes.py"", line 418, in sel
    indexer = get_indexer_nd(self.index, label_array, method, tolerance)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/xarray/core/indexes.py"", line 212, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File ""/opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/pandas/core/indexes/base.py"", line 3729, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
```


### Anything else we need to know?

### Workaround: Use synchronous dask scheduler

The issue does not occur if I use the synchronous dask scheduler by adding at the very beginning of my script:

`dask.config.set(scheduler='single-threaded')`

### Additional debugging print

If I add the following debugging print to the pandas code:

```
--- /tmp/base.py        2022-09-12 16:35:53.739971953 +0200
+++ /opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/pandas/core/indexes/base.py      2022-09-12 16:35:58.864144801 +0200
@@ -3718,7 +3718,6 @@
         self._check_indexing_method(method, limit, tolerance)
 
         if not self._index_as_unique:
+            print(""Original: "", len(self), "", length of set:"", len(set(self)))
             raise InvalidIndexError(self._requires_unique_msg)
 
         if len(target) == 0
```
...I get the following output:

```
Original:  3879 , length of set: 3879
```

So the index seems to be unique, but `self.is_unique` is `False` for some reason (note that `not self._index_as_unique` and `self.is_unique` is the same in this case).

### Proof of race condtion: addd sleep 1s

To confirm that the race condition is at this point we wait for 1s and then check again for uniqueness:

```
--- /tmp/base.py        2022-09-12 16:35:53.739971953 +0200
+++ /opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/pandas/core/indexes/base.py      2022-09-12 16:35:58.864144801 +0200
@@ -3718,7 +3718,10 @@
         self._check_indexing_method(method, limit, tolerance)
 
         if not self._index_as_unique:
+            if not self.is_unique:
+                import time
+                time.sleep(1)
+                print(""now unique?"", self.is_unique)
             raise InvalidIndexError(self._requires_unique_msg)
```

This outputs:

```
now unique? True
```

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-125-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 0.25.3
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0+ds
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.8.1+dfsg
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.2.0
pip3: None
conda: None
pytest: 4.6.9
IPython: 7.13.0
sphinx: 1.8.5
</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7059/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1315111684,I_kwDOAMm_X85OYwME,6816,pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks(),691772,closed,0,,,5,2022-07-22T14:56:41Z,2022-09-13T09:39:48Z,2022-08-19T14:06:09Z,CONTRIBUTOR,,,,"### What is your issue?

I'm doing a lengthy computation, which involves hundreds of GB of data using chunks and map_blocks() so that things fit into RAM and can be done in parallel. From time to time, the following error is raised:

`pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects`

The line where this takes place looks pretty harmless:

    x = a * b.sel(c=d.c)

It's a line inside the function `func` which is passed to a `map_blocks()` call. In this case `a` and `b` are `xr.DataArray` or `xr.DataSet` objects shadowed from outer scope and `d` is the parameter `obj` for `map_blocks()`.

That means, the line below in the traceback looks like this:

    xr.map_blocks(
        lambda d: worker(d).compute().chunk({""time"": None}),
        d,
        template=template)

I guess it's some kind of race condition, since it's not 100% reproducible, but I have no idea how to further investigate the issue to create a proper bug report or fix my code.

Do you have any hint how I could continue building a minimal example or so in such a case? What does the error message want to tell me?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6816/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
467494277,MDExOlB1bGxSZXF1ZXN0Mjk3MTM2MjEz,3104,Fix minor typos in documentation,691772,closed,0,,,2,2019-07-12T16:13:15Z,2019-07-12T16:53:28Z,2019-07-12T16:51:54Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3104,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3104/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
467482848,MDExOlB1bGxSZXF1ZXN0Mjk3MTI2ODgw,3103,Add missing assert to unit test,691772,closed,0,,,1,2019-07-12T15:46:20Z,2019-07-12T16:35:16Z,2019-07-12T16:35:16Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3103,Stumbled upon a unit test which didn't test anything.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3103/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
438389323,MDU6SXNzdWU0MzgzODkzMjM=,2928,"Dask outputs warning: ""The da.atop function has moved to da.blockwise""",691772,closed,0,,,4,2019-04-29T15:59:31Z,2019-07-12T15:56:29Z,2019-07-12T15:56:28Z,CONTRIBUTOR,,,,"#### Problem description
[dask 1.1.0](https://github.com/dask/dask/pull/4348) moved `atop()` to `blockwise()` and introduced a warning when `atop()` is used.

#### Related
* upstream ticket and PR of dask change: dask/dask#4348 dask/dask#4035
* the warning in the [dask documentation](https://examples.dask.org/xarray.html#Custom-workflows-and-automatic-parallelization) in an xarray example, probably not on purpose
* warnings have been already discussed in #2727, but not fixed there
* same issue in a different project: pytroll/satpy#608

#### Code Sample

```python
import numpy as np
import xarray as xr

xr.DataArray(np.ones(1000))
d = xr.DataArray(np.ones(1000))
d.to_netcdf('/tmp/ones.nc')
d = xr.open_dataarray('/tmp/ones.nc', chunks=10)
xr.apply_ufunc(lambda x: 42 * x, d, dask='parallelized', output_dtypes=[np.float64])
```

This outputs the warning:
```
...lib/python3.7/site-packages/dask/array/blockwise.py:204: UserWarning: The da.atop function has moved to da.blockwise
  warnings.warn(""The da.atop function has moved to da.blockwise"")
```

#### Expected Output

No warning. As user of a recent version of dask and xarray, there shouldn't be any warnings if everything is done right. The warning should be tackled inside xarray somehow.

#### Solution

Not sure, can xarray break compatibility with dask <1.1.0 with some future version? Otherwise I guess there needs to be some legacy code in xarray which calls the right function.

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-17-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.3
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 1.2.0
distributed: 1.27.0
matplotlib: 3.0.3
cartopy: None
seaborn: 0.9.0
setuptools: 41.0.0
pip: 19.1
conda: None
pytest: 4.4.1
IPython: 7.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2928/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
434444058,MDExOlB1bGxSZXF1ZXN0MjcxNDM1NjU4,2904,Minor improvement of docstring for Dataset,691772,closed,0,,,6,2019-04-17T19:16:50Z,2019-04-17T20:09:26Z,2019-04-17T20:08:46Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2904,"This might help to avoid confusion. data_vars is always a mapping, not a
mapping, a variable or a tuple.

Passing just a tuple, does not work of course. But for xarray newbies, this might be less obvious and the error message is also not easy to interpret:
```
>>> xr.Dataset(('dim1', np.ones(5)))
...
TypeError: unhashable type: 'numpy.ndarray'
```

The correct version of the example above should be:
```
>>> xr.Dataset({'myvar': ('dim1', np.ones(5))})                                                                            
<xarray.Dataset>
Dimensions:  (dim1: 5)
Dimensions without coordinates: dim1
Data variables:
    myvar    (dim1) float64 1.0 1.0 1.0 1.0 1.0
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2904/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
434439562,MDExOlB1bGxSZXF1ZXN0MjcxNDMyMTc5,2903,Fix minor typos in docstrings,691772,closed,0,,,1,2019-04-17T19:05:47Z,2019-04-17T19:15:10Z,2019-04-17T19:15:10Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2903,"See also pull-request #2860 - the same typo was at many places. Sorry, I have missed the other places when sending the first PR.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2903/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
427604384,MDExOlB1bGxSZXF1ZXN0MjY2MTYyNTQw,2860,Fix minor typo in docstring,691772,closed,0,,,1,2019-04-01T09:35:02Z,2019-04-01T11:18:40Z,2019-04-01T11:18:29Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2860,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2860/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
389685381,MDExOlB1bGxSZXF1ZXN0MjM3NjE4MzYx,2598,Fix wrong error message in interp(),691772,closed,0,,,2,2018-12-11T10:09:53Z,2018-12-11T19:29:03Z,2018-12-11T19:29:03Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2598,"This is just a minor fix of a wrong error message. Please let me know if you think that this is worth testing in unit tests. 

Before:

```
>>> import xarray as xr
>>> d = xr.DataArray([1,2,3])
>>> d.interp(1)
...
ValueError: the first argument to .rename must be a dictionary
```

After:
```
>>> import xarray as xr
>>> d = xr.DataArray([1,2,3])
>>> d.interp(1)
...

ValueError: the first argument to .interp must be a dictionary
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2598/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull