Every response start with <|start_header_id|>assistant<|end_header_id|>

by notadib - opened 12 days ago

12 days ago

vllm parameters:

vllm serve cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic --max-model-len 32000 --max_num_batched_tokens 32000 -tp 2 --max_num_seqs 256 --gpu-memory-utilization 0.95 --tokenizer-pool-size 4 --num_scheduler_steps 16 --max_logprobs 20

markoarnauto

Cortecs org 7 days ago

It seems to be related to the tokenizer config. Since this quantization uses the same configurations as the original model, the error is unlikely caused by the quantization.
Did you encounter the same error running the original model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment