issues: 1802960393
This data as json
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1802960393 | I_kwDOAMm_X85rdv4J | 7985 | Performance degradation with increasing threads in Python multiprocessing | 16239764 | closed | 0 | 2 | 2023-07-13T12:51:39Z | 2023-11-06T06:12:28Z | 2023-11-06T06:12:28Z | NONE | What is your issue?I have a machine with 24 cores and 2 threads per core. I'm trying to optimize the following code for parallel execution. However, I noticed that the code's performance starts to degrade after a certain number of threads.
When executing the code, it scales well up to around 7-8 threads, but after that, the performance starts to deteriorate. I have profiled the code, and it seems that each thread takes more time to execute the same code. For example, with 2 threads:
However, with 22 threads:
I'm struggling to understand why the performance degrades with more threads. I've already verified that the system has the required number of cores and threads. I would appreciate any guidance or suggestions to help me identify the cause of this issue and optimize the code for better performance. It's really hard for me to provide a minimal working example so take that into account. Thank you in advance. Edit: The files are around 80MB each. I have 451 files. I added the following code to profile the function:
And more code for each line in a similar fashion.
I used the libraries |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/7985/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
not_planned | 13221727 | issue |