Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: resolved the issue of not releasing ThreadPoolExecutor when an exception occurs in _generate_worker #14538

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Guowen-Bao
Copy link

Summary

I have found that when parallelizing the model, if an exception occurs in the _generatew_worker, it may result in the ThreadPoolExecutor not being released.

Reproduction method:

  1. Use the following app.yml:
    app:
    description: ''
    icon: 🤖
    icon_background: '#FFEAD5'
    mode: workflow
    name: CodeReview-dp
    use_icon_as_answer_icon: false
    kind: app
    version: 0.1.3
    workflow:
    conversation_variables: []
    environment_variables: []
    features:
    file_upload:
    allowed_file_extensions:

    • .JPG
    • .JPEG
    • .PNG
    • .GIF
    • .WEBP
    • .SVG
      allowed_file_types:
    • image
      allowed_file_upload_methods:
    • local_file
    • remote_url
      enabled: false
      fileUploadConfig:
      audio_file_size_limit: 50
      batch_count_limit: 5
      file_size_limit: 15
      image_file_size_limit: 10
      video_file_size_limit: 100
      workflow_file_upload_limit: 10
      image:
      enabled: false
      number_limits: 3
      transfer_methods:
      • local_file
      • remote_url
        number_limits: 3
        opening_statement: ''
        retriever_resource:
        enabled: true
        sensitive_word_avoidance:
        enabled: false
        speech_to_text:
        enabled: false
        suggested_questions: []
        suggested_questions_after_answer:
        enabled: false
        text_to_speech:
        enabled: false
        language: ''
        voice: ''
        graph:
        edges:
    • data:
      isInIteration: false
      sourceType: llm
      targetType: end
      id: 1732673153974-source-1732590908768-target
      selected: false
      source: '1732673153974'
      sourceHandle: source
      target: '1732590908768'
      targetHandle: target
      type: custom
      zIndex: 0
    • data:
      isInIteration: false
      sourceType: start
      targetType: llm
      id: 1732590744596-source-1740723722209-target
      source: '1732590744596'
      sourceHandle: source
      target: '1740723722209'
      targetHandle: target
      type: custom
      zIndex: 0
    • data:
      isInIteration: false
      sourceType: start
      targetType: llm
      id: 1732590744596-source-1732673153974-target
      source: '1732590744596'
      sourceHandle: source
      target: '1732673153974'
      targetHandle: target
      type: custom
      zIndex: 0
    • data:
      isInIteration: false
      sourceType: llm
      targetType: end
      id: 1740723722209-source-1732590908768-target
      source: '1740723722209'
      sourceHandle: source
      target: '1732590908768'
      targetHandle: target
      type: custom
      zIndex: 0
      nodes:
    • data:
      desc: ''
      selected: false
      title: Start
      type: start
      variables:
      • label: question
        max_length: 128000
        options: []
        required: true
        type: paragraph
        variable: question
        height: 88
        id: '1732590744596'
        position:
        x: 600.6736256500094
        y: 743.497703338181
        positionAbsolute:
        x: 600.6736256500094
        y: 743.497703338181
        selected: false
        sourcePosition: right
        targetPosition: left
        type: custom
        width: 242
    • data:
      desc: ''
      outputs:
      • value_selector:
        • '1732673153974'
        • text
          variable: answer1
          selected: false
          title: END
          type: end
          height: 88
          id: '1732590908768'
          position:
          x: 1280.8715479152754
          y: 743.497703338181
          positionAbsolute:
          x: 1280.8715479152754
          y: 743.497703338181
          selected: true
          sourcePosition: right
          targetPosition: left
          type: custom
          width: 242
    • data:
      context:
      enabled: false
      variable_selector: []
      desc: ''
      model:
      completion_params:
      max_tokens: 8192
      temperature: 0.7
      mode: chat
      name: deepseek-r1:32b
      provider: ollama
      prompt_template:
      • id: d84447df-620e-4e68-a188-5a768f2205b8
        role: system
        text: ''
      • id: 7fb8166e-858f-4209-b934-5702b4e92ca3
        role: user
        text: '{{#1732590744596.question#}}'
        selected: false
        title: LLM1
        type: llm
        variables: []
        vision:
        enabled: false
        height: 96
        id: '1732673153974'
        position:
        x: 933.3192075732832
        y: 617.737863851862
        positionAbsolute:
        x: 933.3192075732832
        y: 617.737863851862
        selected: false
        sourcePosition: right
        targetPosition: left
        type: custom
        width: 242
    • data:
      context:
      enabled: false
      variable_selector: []
      desc: ''
      model:
      completion_params:
      temperature: 0.7
      mode: chat
      name: deepseek-r1:7b
      provider: ollama
      prompt_template:
      • role: system
        text: ''
      • role: user
        text: '{{#1732590744596.question#}}'
        selected: false
        title: LLM 2
        type: llm
        variables: []
        vision:
        enabled: false
        height: 96
        id: '1740723722209'
        position:
        x: 937.6946392608389
        y: 866.9975186903607
        positionAbsolute:
        x: 937.6946392608389
        y: 866.9975186903607
        selected: false
        sourcePosition: right
        targetPosition: left
        type: custom
        width: 242
        viewport:
        x: -466.6755859805346
        y: -451.8702606837736
        zoom: 0.9274570041128427
  2. Create a test exception in the "END" node by modifying the code of _run_node (at api/core/workflow/graph_engine/graph_engine.py);

     try:
         # run node
         generator = node_instance.run()
         for item in generator:
             if node_instance.node_data.title == "END":
                 a = 1/0     # test exception
             if isinstance(item, GraphEngineEvent):
                 if isinstance(item, BaseIterationEvent):
      ...
    
  3. Observing the thread table, it was found that ThreadPoolExecutor did not exit, but _generate_worker exited:
    thread id | thread name| cost | status
    140007953990976 | ThreadMonitor | 1492 | False
    140007953992096 | ThreadPoolExecutor-4_0 | 1491 | True
    140007953992736 | ThreadPoolExecutor-4_1 | 1491 | True

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • [] I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

…xception occurs in _generate_worker

Signed-off-by: Guowen.Bao <[email protected]>
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🐞 bug Something isn't working labels Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant