Request for adding training scripts for SPIN and SPPO #2958

jkx19 · 2025-02-25T17:54:53Z

Feature request

I am one of the authors of paper SPIN (https://arxiv.org/abs/2401.01335) and SPPO (https://arxiv.org/abs/2405.00675). I am delighted to see that the training loss for a single iteration of SPIN and SPPO has been implemented in the DPO trainer. However, since both SPIN and SPPO are iterative algorithms, would it be possible to include training scripts that reflect the full pipeline? I appreciate your consideration!

Motivation

Currently, to run SPIN and SPPO on trl, I need to manually setup the configuration and write a scripts to iteratively run the dpo training scripts. For example, to run SPPO for 3 iteration, I might need a script like the follows

iter_num=3
for i in $(seq 1 $iter_num); do
    if [ "$i" -eq 1 ]; then
        MODEL="{{base_model}}"
    else
        MODEL=$OUTPUT_DIR
    fi
    OUTPUT_DIR="checkpoints/SPPO-Iter${i}"
    OUT="data-llama-3-8b-instruct-sppo-iter${i}"

    bash generate.sh --model $MODEL --out_path $OUT # Generate the responses
    python trl/scripts/dpo.py --loss_type sppo_hard # Run DPO trainer, Other arguments omitted
done

Would it be possible to include a script that outlines the full training process of SPIN/SPPO? This would be greatly helpful for users, as it would eliminate the need to manually write such a script.

Your contribution

I can help prepare the PR if the feature request is approved.

The text was updated successfully, but these errors were encountered:

github-actions bot added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for adding training scripts for SPIN and SPPO #2958

Request for adding training scripts for SPIN and SPPO #2958

jkx19 commented Feb 25, 2025

Request for adding training scripts for SPIN and SPPO #2958

Request for adding training scripts for SPIN and SPPO #2958

Comments

jkx19 commented Feb 25, 2025

Feature request

Motivation

Your contribution