the original model_max_length of the Qwen/Qwen2.5-7B-Instruct is 131072but in this distill model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, it is set to 16384
Qwen/Qwen2.5-7B-Instruct
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
i wonder why we are doing this?
· Sign up or log in to comment