huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 12.1k

Code
Issues 256
Pull requests 66
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 32 Milestones 0

New pull request New

66 Open 1,347 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

🚀 DeepSpeed integration documentation

#2993 opened Feb 28, 2025 by qgallouedec

Loading…

5 tasks

📚 Update customization and distributing training documentation

#2991 opened Feb 28, 2025 by qgallouedec

Loading…

5 tasks

Add support for additional generation kwargs in GRPO Trainer

#2989 opened Feb 28, 2025 by nopepper

Loading…

3 of 5 tasks

Fixing GRPO reward_func being a model with DeepSpeed ZeRO-3

#2984 opened Feb 28, 2025 by jamesbraza

Loading…

Feature(WIP): Add SGLang as inference backend for generation in GRPO

#2981 opened Feb 28, 2025 by jhinpan

Loading…

5 tasks done

Supporting deepspeed>=0.16.4's rename

#2963 opened Feb 26, 2025 by jamesbraza

Loading…

Support ReMax Algorithm

#2955 opened Feb 25, 2025 by liziniu

Loading…

3 tasks done

[Models] Activation checkpointing from TrorchTune

#2954 opened Feb 25, 2025 by kashif • Draft

Agents

#2936 opened Feb 23, 2025 by August-murr

Loading…

Provide more accurate error messages to make the program more robust.

#2932 opened Feb 22, 2025 by dignfei

Loading…

4 tasks

Add the metrics completion_length_max and completion_length_min

#2930 opened Feb 22, 2025 by dignfei

Loading…

4 tasks

Supporting multi-vLLM inference for GRPO

#2929 opened Feb 22, 2025 by ghrua

Loading…

2 of 5 tasks

Liger GRPO support

#2926 opened Feb 21, 2025 by SalmanMohammadi • Draft

4 tasks

Remove CUDA synchronization in mean_token_accuracy

#2902 opened Feb 19, 2025 by cyyever

Loading…

1 task done

GRPOTrainer adds support for OpenAI API-compatible servers to models that generate samples

#2901 opened Feb 19, 2025 by ZYM66 • Draft

2 of 5 tasks

Fast dataset truncate in SFTTrainer

#2898 opened Feb 18, 2025 by mariosasko

Loading…

1 of 5 tasks

[Discussion] Agentic Framework Based on VLLM and E2B for RL

#2880 opened Feb 17, 2025 by August-murr • Draft

[GRPO] Reduce steps where loss starts to remain at 0, accelerate training

#2869 opened Feb 15, 2025 by zhangsheng377

Loading…

BCOTrainer version upgrade fixes

#2867 opened Feb 15, 2025 by claralp

Loading…

3 of 5 tasks

Using model_wrapped can improve the generation speed by approximately 10 times on a single GPU.

#2859 opened Feb 14, 2025 by dignfei

Loading…

[Liger] Liger KTO support

#2812 opened Feb 10, 2025 by vaibhavjindal

Loading…

3 of 5 tasks

[draft] Use vLLM in LogCompletionsCallback

#2797 opened Feb 7, 2025 by tchang1997 • Draft

2 of 4 tasks

Remote GRPO ref model

#2763 opened Feb 4, 2025 by edbeeching • Draft

allow ref_model to be set in trainer to interface parity with other trainers

#2746 opened Feb 3, 2025 by winglian • Draft

5 tasks

WIP: RLOOV2

#2724 opened Jan 31, 2025 by mnoukhov • Draft

3 tasks

Previous 1 2 3 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-02-25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly