Intel
/

Qwen3-Next-80B-A3B-Instruct-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

n1ck-guo commited on Sep 19

Commit

6f67957

·

verified ·

1 Parent(s): 6bae0a3

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -56,6 +56,31 @@ content: A large language model (LLM) is a type of artificial intelligence syste
 """
 ```
 ### Generate the model
 ```bash
 auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --scheme W4A16 --output_dir tmp_autoround

 """
 ```
+### vLLM
+The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens.
+```shell
+vllm serve Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound --port 8000 --max-model-len 262144
+```
+The following command is recommended for MTP with the rest settings the same as above:
+```shell
+vllm serve Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound --port 8000 --max-model-len 262144 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
+```
+```bash
+curl -noproxy '*' http://localhost::8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "messages": [
+        {"role": "user", "content": "Give me a short introduction to large language model."}
+        ],
+        "max_tokens": 1024
+    }'
+# "content":
+#    "A large language model (LLM) is a type of artificial intelligence system trained on vast amounts of text data to understand, generate, and manipulate human language. These models use deep learning architectures—often based on the transformer network—to predict the next word in a sequence, enabling them to perform tasks like answering questions, writing essays, translating languages, and even coding. LLMs, such as GPT, Gemini, and Claude, learn patterns and relationships in language without explicit programming, allowing them to produce human-like responses across a wide range of topics. While powerful, they don’t “understand” language in the human sense and can sometimes generate plausible-sounding but incorrect or biased information.",
+```
 ### Generate the model
 ```bash
 auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --scheme W4A16 --output_dir tmp_autoround