Release v0.7.1 · InternLM/lmdeploy

What's Changed

use weights iterator while loading by @RunningLeon in #2886
Add deepseek-r1 chat template by @AllentDan in #3072
Update tokenizer by @lvhan028 in #3061
Set max concurrent requests by @AllentDan in #2961
remove logitswarper by @grimoire in #3109
Update benchmark script and user guide by @lvhan028 in #3110
support eos_token list in turbomind by @irexyc in #3044
Use aiohttp inside proxy server && add --disable-cache-status argument by @AllentDan in #3020
Update runtime package dependencies by @zgjja in #3142
Make turbomind support embedding inputs on GPU by @chengyuma in #3177

[dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
fix error in interactive api by @lvhan028 in #3074
fix sliding window mgr by @grimoire in #3068
More arguments in api_client, update docstrings by @AllentDan in #3077
Add system role to deepseek chat template by @AllentDan in #3031
Fix xcomposer2d5 by @irexyc in #3087
fix user guide about cogvlm deployment by @lvhan028 in #3088
fix postional argument by @lvhan028 in #3086
Fix UT of deepseek chat template by @lvhan028 in #3125
Fix internvl2.5 error after eviction by @grimoire in #3122
Fix cogvlm and phi3vision by @RunningLeon in #3137
[fix] fix vl gradio, use pipeline api and remove interactive chat by @irexyc in #3136
fix the issue that stop_token may be less than defined in model.py by @irexyc in #3148
fix typing by @lz1998 in #3153
fix min length penalty by @irexyc in #3150
fix default temperature value by @irexyc in #3166
Use pad_token_id as image_token_id for vl models by @RunningLeon in #3158
Fix tool call prompt for InternLM and Qwen by @AllentDan in #3156
Update qwen2.py by @GxjGit in #3174
fix temperature=0 by @grimoire in #3176
fix blocked fp8 moe by @grimoire in #3181
fix deepseekv2 has no attribute use_mla error by @CUHKSZzxy in #3188
fix unstoppable chat by @lvhan028 in #3189

Full Changelog: v0.7.0...v0.7.1