html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6816#issuecomment-1243864752,https://api.github.com/repos/pydata/xarray/issues/6816,1243864752,IC_kwDOAMm_X85KI96w,691772,2022-09-12T14:55:06Z,2022-09-13T09:39:48Z,CONTRIBUTOR,"Not sure what changed, but now I do get the same error also with my small and synthetic test data. This way I was able to debug a bit further. I am pretty sure this is a bug in xarray or pandas.
I think something in **`pandas.core.indexes.base.Index` is not thread-safe**. At least this seems to be the place of the race condition.
I can create a new ticket, if you prefer, but since I am not sure in which project, I will continue to collect information here. Unfortunately I have not yet managed to create a minimal example as this is quite tricky with timing issues.
### Additional debugging print and proof of race condition
If I add the following debugging print to the pandas code:
```
--- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200
+++ /home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200
@@ -3718,7 +3718,6 @@
self._check_indexing_method(method, limit, tolerance)
if not self._index_as_unique:
+ print(""Original: "", len(self), "", length of set:"", len(set(self)))
raise InvalidIndexError(self._requires_unique_msg)
if len(target) == 0
```
...I get the following output:
```
Original: 3879 , length of set: 3879
```
So the index seems to be unique, but `self.is_unique` is `False` for some reason (note that `not self._index_as_unique` and `self.is_unique` is the same in this case).
To confirm that the race condition is at this point we wait for 1s and then check again for uniqueness:
```
--- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200
+++ /home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200
@@ -3718,7 +3718,10 @@
self._check_indexing_method(method, limit, tolerance)
if not self._index_as_unique:
+ if not self.is_unique:
+ import time
+ time.sleep(1)
+ print(""now unique?"", self.is_unique)
raise InvalidIndexError(self._requires_unique_msg)
```
This outputs:
```
now unique? True
```
### Traceback
```
Traceback (most recent call last):
File ""scripts/my_script.py"", line 57, in
main()
File ""scripts/my_script.py"", line 48, in main
my_function(
File ""/home/lumbric/my_project/src/calculations.py"", line 136, in my_function
result = result.compute()
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 947, in compute
return new.load(**kwargs)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 921, in load
ds = self._to_temp_dataset().load(**kwargs)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/dataset.py"", line 861, in load
evaluated_data = da.compute(*lazy_data.values(), **kwargs)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/base.py"", line 600, in compute
results = schedule(dsk, keys, **kwargs)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/threaded.py"", line 81, in get
results = get_async(
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/local.py"", line 508, in get_async
raise_exception(exc, tb)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/local.py"", line 316, in reraise
raise exc
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/local.py"", line 221, in execute_task
result = _execute_task(task, data)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/core.py"", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/core.py"", line 119, in
return func(*(_execute_task(a, cache) for a in args))
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/dask/core.py"", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/parallel.py"", line 285, in _wrapper
result = func(*converted_args, **kwargs)
File ""/home/lumbric/some_project/src/calculations.py"", line 100, in
lambda input_data: worker(input_data).compute().chunk({""time"": None}),
File ""/home/lumbric/some_project/src/calculations.py"", line 69, in worker
raise e
File ""/home/lumbric/some_project/src/calculations.py"", line 60, in worker
out = some_data * some_other_data.sel(some_dimension=input_data.some_dimension)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 1329, in sel
ds = self._to_temp_dataset().sel(
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/dataset.py"", line 2502, in sel
pos_indexers, new_indexes = remap_label_indexers(
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/coordinates.py"", line 421, in remap_label_indexers
pos_indexers, new_indexes = indexing.remap_label_indexers(
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/indexing.py"", line 121, in remap_label_indexers
idxr, new_idx = index.query(labels, method=method, tolerance=tolerance)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/indexes.py"", line 245, in query
indexer = get_indexer_nd(self.index, label, method, tolerance)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/xarray/core/indexes.py"", line 142, in get_indexer_nd
flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
File ""/home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/pandas/core/indexes/base.py"", line 3722, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
```
### Workaround
The issue does not occur if I use the synchronous dask scheduler by adding at the very beginning of my script:
```dask.config.set(scheduler='single-threaded')```
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.05.2
distributed: 2022.5.2
matplotlib: 3.5.2
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
setuptools: 62.3.2
pip: 22.1.2
conda: 4.12.0
pytest: 7.1.2
IPython: 8.4.0
sphinx: None
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1315111684
https://github.com/pydata/xarray/issues/6816#issuecomment-1243882465,https://api.github.com/repos/pydata/xarray/issues/6816,1243882465,IC_kwDOAMm_X85KJCPh,691772,2022-09-12T15:07:45Z,2022-09-12T15:07:45Z,CONTRIBUTOR,"I think [these are the values](https://gist.github.com/lumbric/c100299d7ba4470c4d21bdabdd6a689f) of the index, the values seem to be unique and monotonic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1315111684
https://github.com/pydata/xarray/issues/6816#issuecomment-1220519740,https://api.github.com/repos/pydata/xarray/issues/6816,1220519740,IC_kwDOAMm_X85Iv6c8,691772,2022-08-19T10:33:59Z,2022-08-19T10:33:59Z,CONTRIBUTOR,"Thanks a lot for your quick reply and your helpful hints!
In the meantime I have verified that: `d.c` is unique, i.e. `np.unqiue(d.c).size == d.c.size`
Unfortunately I was not able to reproduce the error often enough lately to test it with the synchronous scheduler nor to create a smaller synthetic example which reproduces the problem. One run takes about an hour until the exception occurs (or not), which makes things hard to debug. But I will continue trying and keep this ticket updated.
Any further suggestions very welcome :) Thanks a lot!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1315111684