You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am one of the authors of paper SPIN (https://arxiv.org/abs/2401.01335) and SPPO (https://arxiv.org/abs/2405.00675). I am delighted to see that the training loss for a single iteration of SPIN and SPPO has been implemented in the DPO trainer. However, since both SPIN and SPPO are iterative algorithms, would it be possible to include training scripts that reflect the full pipeline? I appreciate your consideration!
Motivation
Currently, to run SPIN and SPPO on trl, I need to manually setup the configuration and write a scripts to iteratively run the dpo training scripts. For example, to run SPPO for 3 iteration, I might need a script like the follows
iter_num=3
for i in $(seq 1 $iter_num); do
if [ "$i" -eq 1 ]; then
MODEL="{{base_model}}"
else
MODEL=$OUTPUT_DIR
fi
OUTPUT_DIR="checkpoints/SPPO-Iter${i}"
OUT="data-llama-3-8b-instruct-sppo-iter${i}"
bash generate.sh --model $MODEL --out_path $OUT # Generate the responses
python trl/scripts/dpo.py --loss_type sppo_hard # Run DPO trainer, Other arguments omitted
done
Would it be possible to include a script that outlines the full training process of SPIN/SPPO? This would be greatly helpful for users, as it would eliminate the need to manually write such a script.
Your contribution
I can help prepare the PR if the feature request is approved.
The text was updated successfully, but these errors were encountered:
Feature request
I am one of the authors of paper SPIN (https://arxiv.org/abs/2401.01335) and SPPO (https://arxiv.org/abs/2405.00675). I am delighted to see that the training loss for a single iteration of SPIN and SPPO has been implemented in the DPO trainer. However, since both SPIN and SPPO are iterative algorithms, would it be possible to include training scripts that reflect the full pipeline? I appreciate your consideration!
Motivation
Currently, to run SPIN and SPPO on trl, I need to manually setup the configuration and write a scripts to iteratively run the dpo training scripts. For example, to run SPPO for 3 iteration, I might need a script like the follows
Would it be possible to include a script that outlines the full training process of SPIN/SPPO? This would be greatly helpful for users, as it would eliminate the need to manually write such a script.
Your contribution
I can help prepare the PR if the feature request is approved.
The text was updated successfully, but these errors were encountered: