What engine should be used to infer this model?

by RobertLiu0905 - opened Sep 26, 2024

Sep 26, 2024

Thank you for you contribution，my question is : What engine should be used to infer this model?

Oct 6, 2024

Oct 21, 2024

Oct 24, 2024

how use vllm run this model , could you give some tips or examples

dsikka

NM Testing org Oct 24, 2024

•

czqqq

Oct 30, 2024

Why does the speed of the quantized model decrease significantly?

dsikka

NM Testing org Oct 30, 2024

How are you running it?

Oct 31, 2024

How are you running it?

after finished deepseek_moe_w4a16.py, you will get a int4 model, size should near 112G, then run it with vllm 0.6, i failed with 054 version, try to skip it. https://github.com/vllm-project/llm-compressor/issues/857

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment