issue_comments
3 rows where author_association = "CONTRIBUTOR", issue = 1315111684 and user = 691772 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks() · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1243864752 | https://github.com/pydata/xarray/issues/6816#issuecomment-1243864752 | https://api.github.com/repos/pydata/xarray/issues/6816 | IC_kwDOAMm_X85KI96w | lumbric 691772 | 2022-09-12T14:55:06Z | 2022-09-13T09:39:48Z | CONTRIBUTOR | Not sure what changed, but now I do get the same error also with my small and synthetic test data. This way I was able to debug a bit further. I am pretty sure this is a bug in xarray or pandas. I think something in I can create a new ticket, if you prefer, but since I am not sure in which project, I will continue to collect information here. Unfortunately I have not yet managed to create a minimal example as this is quite tricky with timing issues. Additional debugging print and proof of race conditionIf I add the following debugging print to the pandas code: ``` --- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200 +++ /home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200 @@ -3718,7 +3718,6 @@ self._check_indexing_method(method, limit, tolerance)
So the index seems to be unique, but To confirm that the race condition is at this point we wait for 1s and then check again for uniqueness: ``` --- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200 +++ /home/lumbric/.conda/envs/my_project/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200 @@ -3718,7 +3718,10 @@ self._check_indexing_method(method, limit, tolerance)
This outputs:
Traceback
WorkaroundThe issue does not occur if I use the synchronous dask scheduler by adding at the very beginning of my script:
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.05.2
distributed: 2022.5.2
matplotlib: 3.5.2
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
setuptools: 62.3.2
pip: 22.1.2
conda: 4.12.0
pytest: 7.1.2
IPython: 8.4.0
sphinx: None
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks() 1315111684 | |
1243882465 | https://github.com/pydata/xarray/issues/6816#issuecomment-1243882465 | https://api.github.com/repos/pydata/xarray/issues/6816 | IC_kwDOAMm_X85KJCPh | lumbric 691772 | 2022-09-12T15:07:45Z | 2022-09-12T15:07:45Z | CONTRIBUTOR | I think these are the values of the index, the values seem to be unique and monotonic. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks() 1315111684 | |
1220519740 | https://github.com/pydata/xarray/issues/6816#issuecomment-1220519740 | https://api.github.com/repos/pydata/xarray/issues/6816 | IC_kwDOAMm_X85Iv6c8 | lumbric 691772 | 2022-08-19T10:33:59Z | 2022-08-19T10:33:59Z | CONTRIBUTOR | Thanks a lot for your quick reply and your helpful hints! In the meantime I have verified that: Unfortunately I was not able to reproduce the error often enough lately to test it with the synchronous scheduler nor to create a smaller synthetic example which reproduces the problem. One run takes about an hour until the exception occurs (or not), which makes things hard to debug. But I will continue trying and keep this ticket updated. Any further suggestions very welcome :) Thanks a lot! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
pandas.errors.InvalidIndexError is raised in some runs when using chunks and map_blocks() 1315111684 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1