Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for adding training scripts for SPIN and SPPO #2958

Open
jkx19 opened this issue Feb 25, 2025 · 0 comments
Open

Request for adding training scripts for SPIN and SPPO #2958

jkx19 opened this issue Feb 25, 2025 · 0 comments
Labels
🏋 DPO Related to DPO ✨ enhancement New feature or request

Comments

@jkx19
Copy link

jkx19 commented Feb 25, 2025

Feature request

I am one of the authors of paper SPIN (https://arxiv.org/abs/2401.01335) and SPPO (https://arxiv.org/abs/2405.00675). I am delighted to see that the training loss for a single iteration of SPIN and SPPO has been implemented in the DPO trainer. However, since both SPIN and SPPO are iterative algorithms, would it be possible to include training scripts that reflect the full pipeline? I appreciate your consideration!

Motivation

Currently, to run SPIN and SPPO on trl, I need to manually setup the configuration and write a scripts to iteratively run the dpo training scripts. For example, to run SPPO for 3 iteration, I might need a script like the follows

iter_num=3
for i in $(seq 1 $iter_num); do
    if [ "$i" -eq 1 ]; then
        MODEL="{{base_model}}"
    else
        MODEL=$OUTPUT_DIR
    fi
    OUTPUT_DIR="checkpoints/SPPO-Iter${i}"
    OUT="data-llama-3-8b-instruct-sppo-iter${i}"

    bash generate.sh --model $MODEL --out_path $OUT # Generate the responses
    python trl/scripts/dpo.py --loss_type sppo_hard # Run DPO trainer, Other arguments omitted
done

Would it be possible to include a script that outlines the full training process of SPIN/SPPO? This would be greatly helpful for users, as it would eliminate the need to manually write such a script.

Your contribution

I can help prepare the PR if the feature request is approved.

@github-actions github-actions bot added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO ✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant