issues
1 row where state = "open" and user = 691772 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1379372915 | I_kwDOAMm_X85SN49z | 7059 | pandas.errors.InvalidIndexError raised when running computation in parallel using dask | lumbric 691772 | open | 0 | 8 | 2022-09-20T12:52:16Z | 2024-03-02T16:43:15Z | CONTRIBUTOR | What happened?I'm doing a computation using chunks and (This issue was initially discussed in #6816, but the ticket was closed, because I couldn't reproduce the problem any longer. Now it seems to be reproducible in every run, so it is time for a proper bug report, which is this ticket here.) What did you expect to happen?Dask schedulers Minimal Complete Verifiable Example 1Edit: I've managed to reduce the verifiable example, see example 2 below. ```Python I wasn't able to reproduce the issue with a smaller code example, so I provide all my code and my test data. This should make it possible to reproduce the issue in less than a minute.Requirements:- git- mamba, see https://github.com/mamba-org/mambagit clone https://github.com/lumbric/reproduce_invalidindexerror.git cd reproduce_invalidindexerror mamba env create -f env.yml alternatively run the following, will install latest versions from conda-forge:conda create -n reproduce_invalidindexerrorconda activate reproduce_invalidindexerrormamba install -c conda-forge python=3.8 matplotlib pytest-cov dask openpyxl pytest pip xarray netcdf4 jupyter pandas scipy flake8 dvc pre-commit pyarrow statsmodels rasterio scikit-learn pytest-watch pdbpp black seabornconda activate reproduce_invalidindexerror dvc repro checks_simulation ``` Minimal Complete Verifiable Example 2```Python import numpy as np import pandas as pd import xarray as xr from multiprocessing import Lock from dask.diagnostics import ProgressBar Workaround for xarray#6816: Parallel execution causes often an InvalidIndexErrorhttps://github.com/pydata/xarray/issues/6816#issuecomment-1243864752import daskdask.config.set(scheduler="single-threaded")def generate_netcdf_files(): fnames = [f"{i:02d}.nc" for i in range(21)] for i, fname in enumerate(fnames): xr.DataArray( np.ones((3879, 48)), dims=("locations", "time"), coords={ "time": pd.date_range(f"{2000 + i}-01-01", periods=48, freq="D"), "locations": np.arange(3879), }, ).to_netcdf(fname) return fnames def compute(locations, data): def resample_annually(data): return data.sortby("time").resample(time="1A", label="left", loffset="1D").mean(dim="time")
def main(): fnames = generate_netcdf_files()
if name == "main": main() ``` MVCE confirmation
Relevant log outputThis is the traceback of "Minimal Complete Verifiable Example 1".
Anything else we need to know?Workaround: Use synchronous dask schedulerThe issue does not occur if I use the synchronous dask scheduler by adding at the very beginning of my script:
Additional debugging printIf I add the following debugging print to the pandas code: ``` --- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200 +++ /opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200 @@ -3718,7 +3718,6 @@ self._check_indexing_method(method, limit, tolerance)
So the index seems to be unique, but Proof of race condtion: addd sleep 1sTo confirm that the race condition is at this point we wait for 1s and then check again for uniqueness: ``` --- /tmp/base.py 2022-09-12 16:35:53.739971953 +0200 +++ /opt/miniconda3/envs/reproduce_invalidindexerror/lib/python3.8/site-packages/pandas/core/indexes/base.py 2022-09-12 16:35:58.864144801 +0200 @@ -3718,7 +3718,10 @@ self._check_indexing_method(method, limit, tolerance)
This outputs:
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-125-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 0.25.3
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0+ds
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.8.1+dfsg
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.2.0
pip3: None
conda: None
pytest: 4.6.9
IPython: 7.13.0
sphinx: 1.8.5
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7059/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);