fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

Guowen-Bao · 2025-02-28T08:38:36Z

Summary

I have found that when parallelizing the model, if an exception occurs in the _generatew_worker, it may result in the ThreadPoolExecutor not being released.

Reproduction method:

Use the following app.yml:
app:
description: ''
icon: 🤖
icon_background: '#FFEAD5'
mode: workflow
name: CodeReview-dp
use_icon_as_answer_icon: false
kind: app
version: 0.1.3
workflow:
conversation_variables: []
environment_variables: []
features:
file_upload:
allowed_file_extensions:
- .JPG
- .JPEG
- .PNG
- .GIF
- .WEBP
- .SVG
  allowed_file_types:
- image
  allowed_file_upload_methods:
- local_file
- remote_url
  enabled: false
  fileUploadConfig:
  audio_file_size_limit: 50
  batch_count_limit: 5
  file_size_limit: 15
  image_file_size_limit: 10
  video_file_size_limit: 100
  workflow_file_upload_limit: 10
  image:
  enabled: false
  number_limits: 3
  transfer_methods:
  - local_file
  - remote_url
    number_limits: 3
    opening_statement: ''
    retriever_resource:
    enabled: true
    sensitive_word_avoidance:
    enabled: false
    speech_to_text:
    enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
    enabled: false
    text_to_speech:
    enabled: false
    language: ''
    voice: ''
    graph:
    edges:
- data:
  isInIteration: false
  sourceType: llm
  targetType: end
  id: 1732673153974-source-1732590908768-target
  selected: false
  source: '1732673153974'
  sourceHandle: source
  target: '1732590908768'
  targetHandle: target
  type: custom
  zIndex: 0
- data:
  isInIteration: false
  sourceType: start
  targetType: llm
  id: 1732590744596-source-1740723722209-target
  source: '1732590744596'
  sourceHandle: source
  target: '1740723722209'
  targetHandle: target
  type: custom
  zIndex: 0
- data:
  isInIteration: false
  sourceType: start
  targetType: llm
  id: 1732590744596-source-1732673153974-target
  source: '1732590744596'
  sourceHandle: source
  target: '1732673153974'
  targetHandle: target
  type: custom
  zIndex: 0
- data:
  isInIteration: false
  sourceType: llm
  targetType: end
  id: 1740723722209-source-1732590908768-target
  source: '1740723722209'
  sourceHandle: source
  target: '1732590908768'
  targetHandle: target
  type: custom
  zIndex: 0
  nodes:
- data:
  desc: ''
  selected: false
  title: Start
  type: start
  variables:
  - label: question
    max_length: 128000
    options: []
    required: true
    type: paragraph
    variable: question
    height: 88
    id: '1732590744596'
    position:
    x: 600.6736256500094
    y: 743.497703338181
    positionAbsolute:
    x: 600.6736256500094
    y: 743.497703338181
    selected: false
    sourcePosition: right
    targetPosition: left
    type: custom
    width: 242
- data:
  desc: ''
  outputs:
  - value_selector:
    - '1732673153974'
    - text
      variable: answer1
      selected: false
      title: END
      type: end
      height: 88
      id: '1732590908768'
      position:
      x: 1280.8715479152754
      y: 743.497703338181
      positionAbsolute:
      x: 1280.8715479152754
      y: 743.497703338181
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 242
- data:
  context:
  enabled: false
  variable_selector: []
  desc: ''
  model:
  completion_params:
  max_tokens: 8192
  temperature: 0.7
  mode: chat
  name: deepseek-r1:32b
  provider: ollama
  prompt_template:
  - id: d84447df-620e-4e68-a188-5a768f2205b8
    role: system
    text: ''
  - id: 7fb8166e-858f-4209-b934-5702b4e92ca3
    role: user
    text: '{{#1732590744596.question#}}'
    selected: false
    title: LLM1
    type: llm
    variables: []
    vision:
    enabled: false
    height: 96
    id: '1732673153974'
    position:
    x: 933.3192075732832
    y: 617.737863851862
    positionAbsolute:
    x: 933.3192075732832
    y: 617.737863851862
    selected: false
    sourcePosition: right
    targetPosition: left
    type: custom
    width: 242
- data:
  context:
  enabled: false
  variable_selector: []
  desc: ''
  model:
  completion_params:
  temperature: 0.7
  mode: chat
  name: deepseek-r1:7b
  provider: ollama
  prompt_template:
  - role: system
    text: ''
  - role: user
    text: '{{#1732590744596.question#}}'
    selected: false
    title: LLM 2
    type: llm
    variables: []
    vision:
    enabled: false
    height: 96
    id: '1740723722209'
    position:
    x: 937.6946392608389
    y: 866.9975186903607
    positionAbsolute:
    x: 937.6946392608389
    y: 866.9975186903607
    selected: false
    sourcePosition: right
    targetPosition: left
    type: custom
    width: 242
    viewport:
    x: -466.6755859805346
    y: -451.8702606837736
    zoom: 0.9274570041128427

Create a test exception in the "END" node by modifying the code of _run_node (at api/core/workflow/graph_engine/graph_engine.py);

 try:
     # run node
     generator = node_instance.run()
     for item in generator:
         if node_instance.node_data.title == "END":
             a = 1/0     # test exception
         if isinstance(item, GraphEngineEvent):
             if isinstance(item, BaseIterationEvent):
  ...

Observing the thread table, it was found that ThreadPoolExecutor did not exit, but _generate_worker exited:
thread id | thread name| cost | status
140007953990976 | ThreadMonitor | 1492 | False
140007953992096 | ThreadPoolExecutor-4_0 | 1491 | True
140007953992736 | ThreadPoolExecutor-4_1 | 1491 | True

Checklist

Important

Please review the checklist below before submitting your pull request.

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
[] I've updated the documentation accordingly.
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

…xception occurs in _generate_worker Signed-off-by: Guowen.Bao <[email protected]>

fix: resolved the issue of not releasing ThreadPoolExecutor when an e…

526f632

…xception occurs in _generate_worker Signed-off-by: Guowen.Bao <[email protected]>

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🐞 bug Something isn't working labels Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

Guowen-Bao commented Feb 28, 2025

fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

Are you sure you want to change the base?

fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

Conversation

Guowen-Bao commented Feb 28, 2025

Summary

Checklist