What's Changed
🚀 Features
- support release pipeline by @irexyc in #3069
- [feature] add dlinfer w8a8 support. by @Reinerzhou in #2988
- [maca] support deepseekv2 for maca backend. by @Reinerzhou in #2918
- [Feature] support deepseek-vl2 for pytorch engine by @CUHKSZzxy in #3149
💥 Improvements
- use weights iterator while loading by @RunningLeon in #2886
- Add deepseek-r1 chat template by @AllentDan in #3072
- Update tokenizer by @lvhan028 in #3061
- Set max concurrent requests by @AllentDan in #2961
- remove logitswarper by @grimoire in #3109
- Update benchmark script and user guide by @lvhan028 in #3110
- support eos_token list in turbomind by @irexyc in #3044
- Use aiohttp inside proxy server && add --disable-cache-status argument by @AllentDan in #3020
- Update runtime package dependencies by @zgjja in #3142
- Make turbomind support embedding inputs on GPU by @chengyuma in #3177
🐞 Bug fixes
- [dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
- fix error in interactive api by @lvhan028 in #3074
- fix sliding window mgr by @grimoire in #3068
- More arguments in api_client, update docstrings by @AllentDan in #3077
- Add system role to deepseek chat template by @AllentDan in #3031
- Fix xcomposer2d5 by @irexyc in #3087
- fix user guide about cogvlm deployment by @lvhan028 in #3088
- fix postional argument by @lvhan028 in #3086
- Fix UT of deepseek chat template by @lvhan028 in #3125
- Fix internvl2.5 error after eviction by @grimoire in #3122
- Fix cogvlm and phi3vision by @RunningLeon in #3137
- [fix] fix vl gradio, use pipeline api and remove interactive chat by @irexyc in #3136
- fix the issue that stop_token may be less than defined in model.py by @irexyc in #3148
- fix typing by @lz1998 in #3153
- fix min length penalty by @irexyc in #3150
- fix default temperature value by @irexyc in #3166
- Use pad_token_id as image_token_id for vl models by @RunningLeon in #3158
- Fix tool call prompt for InternLM and Qwen by @AllentDan in #3156
- Update qwen2.py by @GxjGit in #3174
- fix temperature=0 by @grimoire in #3176
- fix blocked fp8 moe by @grimoire in #3181
- fix deepseekv2 has no attribute use_mla error by @CUHKSZzxy in #3188
- fix unstoppable chat by @lvhan028 in #3189
🌐 Other
- [ci] add internlm3 into testcase by @zhulinJulia24 in #3038
- add internlm3 to supported models by @lvhan028 in #3041
- update pre-commit config by @lvhan028 in #2683
- [maca] add cudagraph support on maca backend. by @Reinerzhou in #2834
- bump version to v0.7.0.post1 by @lvhan028 in #3076
- bump version to v0.7.0.post2 by @lvhan028 in #3094
- [Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
- bump version to v0.7.0.post3 by @lvhan028 in #3115
- [ci] fix some fail in daily testcase by @zhulinJulia24 in #3134
- Bump version to v0.7.1 by @lvhan028 in #3178
New Contributors
- @Lychee-acaca made their first contribution in #3103
- @lz1998 made their first contribution in #3153
- @GxjGit made their first contribution in #3174
- @chengyuma made their first contribution in #3177
- @CUHKSZzxy made their first contribution in #3149
Full Changelog: v0.7.0...v0.7.1