sglang通过容器化部署deepseek-70B-Q4版本时对话回复的内容乱七八糟 #3928
Unanswered
JerryWengcw
asked this question in
Q&A
Replies: 2 comments
-
为啥你的能回答,我的根本不回答。只是一直占用显卡 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Most of the quants on HF have never been validated with benchmarks. As far as I am concerned, most are bad to garbage quality. Use at your own risk. Find quants that come with benchmarks. Look at our ModeCloud Vortex quants which all have been validated for regression by multiple benchmarks for each quantized model. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
docker run -d
data:image/s3,"s3://crabby-images/e9cb8/e9cb85790a42b6f5e5e315408cb60d64da353254" alt="456"
-p 11434:30000
--name sglang-container
--gpus='"device=0,1"'
-v /dataset/tools/deepseek/70B:/models
--ipc=host
lmsysorg/sglang:latest
python3 -m sglang.launch_server
--model-path /models/DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf
--host 0.0.0.0
--port 30000
--tp 2
--log-level debug
--show-time-cost
--log-requests
--context-length 2048
--enable-metrics
--trust-remote-code
--disable-radix-cache
--max_num_batched_tokens 4096 通过容器化部署deepseek-70B-Q4版本,对话回复的内容没有逻辑,全是乱七八糟的回复,是什么原因,要如何调整?
Beta Was this translation helpful? Give feedback.
All reactions