generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issues: huggingface/trl
[Tracking issue] Integrate native liger-kernel losses
#2495
opened Dec 17, 2024 by
qgallouedec
Open
6
[Tracking issue] Wrong loss scaling when accumulating gradient
#2617
opened Jan 23, 2025 by
qgallouedec
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
DPO logits is
nan
from begin when train reasoning dataset with long completion
#2994
opened Mar 1, 2025 by
AIR-hl
5 tasks done
(deepspeed stage 3) precompute_ref_log_probs support
⚡accelerate
Related to accelerate
🚀 deepspeed
Related to deepspeed
🏋 DPO
Related to DPO
✨ enhancement
New feature or request
#2985
opened Feb 28, 2025 by
cyr0930
GRPO Trainer, set num_iteration bigger than default 1, got RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 GRPO
Related to GRPO
#2983
opened Feb 28, 2025 by
Andcircle
5 tasks done
AttributeError: Can't pickle local object 'GRPOTrainer.__init__.<locals>.data_collator'
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2979
opened Feb 27, 2025 by
williamzebrowskI
5 tasks done
Add Community Tutorials related to TRL Integrations
🚀 deepspeed
Related to deepspeed
📚 documentation
Improvements or additions to documentation
✨ enhancement
New feature or request
🦥 unsloth
Related to Unsloth
#2978
opened Feb 27, 2025 by
ParagEkbote
GRPO Stuck on Step 0
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2977
opened Feb 27, 2025 by
zaddy6
5 tasks done
There may be some doubts about the advantage function of GRPO
🏋 GRPO
Related to GRPO
🏋 Reward
Related to Reward modelling
#2976
opened Feb 27, 2025 by
L1n111ya
5 tasks done
How many H20 (96GB) GPUs are needed to train Qwen7B with the GRPO algorithm?
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#2972
opened Feb 27, 2025 by
Tuziking
GRPO:It takes the majority of time in generation using vllm
🏋 GRPO
Related to GRPO
#2971
opened Feb 27, 2025 by
wusijie123
GRPO: For the same batch of data, there is a significant difference in model performance between using trainer. predict() and using trainer. train().
🏋 GRPO
Related to GRPO
🏋 Reward
Related to Reward modelling
#2970
opened Feb 27, 2025 by
yzhdut
Using unwrap_model_for_generation and _enable_gradient_checkpointing together in GPRO Trainer will cause an error.
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2965
opened Feb 26, 2025 by
Alex-Songs
5 tasks done
Add length normalization option to DPO Trainer
🏋 DPO
Related to DPO
✨ enhancement
New feature or request
#2964
opened Feb 26, 2025 by
ggbetz
DeepSpeedZeRoOffload
is incompatible with DeepSpeed>=0.16.4
🐛 bug
#2962
opened Feb 26, 2025 by
jamesbraza
5 tasks done
Maybe some bug in multi gpu training
⚡accelerate
Related to accelerate
🐛 bug
Something isn't working
#2960
opened Feb 26, 2025 by
Kfkcome
5 tasks done
compute_metrics
in GRPOTrainer
✨ enhancement
#2959
opened Feb 26, 2025 by
dipta007
Request for adding training scripts for SPIN and SPPO
🏋 DPO
Related to DPO
✨ enhancement
New feature or request
#2958
opened Feb 25, 2025 by
jkx19
A quick question about AutoModelForCausalLMWithValueHead
❓ question
Seeking clarification or more information
#2957
opened Feb 25, 2025 by
SakurajimaMaiii
A little question about GRPO padding way
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#2944
opened Feb 24, 2025 by
MlSAKA-MlKOTO
IterableDataset not supported on GRPOTrainer
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2942
opened Feb 24, 2025 by
Marsella8
5 tasks done
How to dynamically adjust params during grpo training?
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#2941
opened Feb 24, 2025 by
Tomsawyerhu
GRPO: For the same batch of data, there is a significant difference in model performance between using trainer. predict() and using trainer. train().
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#2937
opened Feb 23, 2025 by
yzhdut
The ability to specify tokens (or strings) to be ignored in the KL divergence calculation would be useful
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2933
opened Feb 22, 2025 by
kalomaze
Is there any problem with GRPOtrainer’s memory usage?
🏋 GRPO
Related to GRPO
❓ question
Seeking clarification or more information
#2927
opened Feb 21, 2025 by
Tuziking
NCCL timeout when GRPO training with vllm
🐛 bug
Something isn't working
🏋 GRPO
Related to GRPO
#2923
opened Feb 21, 2025 by
edwardzjl
How to support multi-device VLLM inference in the GRPO Trainer
✨ enhancement
New feature or request
🏋 GRPO
Related to GRPO
#2922
opened Feb 21, 2025 by
0x404
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.