n1ck-guo commited on
Commit
6f67957
·
verified ·
1 Parent(s): 6bae0a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -56,6 +56,31 @@ content: A large language model (LLM) is a type of artificial intelligence syste
56
  """
57
  ```
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ### Generate the model
60
  ```bash
61
  auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --scheme W4A16 --output_dir tmp_autoround
 
56
  """
57
  ```
58
 
59
+ ### vLLM
60
+ The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens.
61
+ ```shell
62
+ vllm serve Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound --port 8000 --max-model-len 262144
63
+ ```
64
+
65
+ The following command is recommended for MTP with the rest settings the same as above:
66
+ ```shell
67
+ vllm serve Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound --port 8000 --max-model-len 262144 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
68
+ ```
69
+
70
+ ```bash
71
+ curl -noproxy '*' http://localhost::8000/v1/chat/completions \
72
+ -H "Content-Type: application/json" \
73
+ -d '{
74
+ "messages": [
75
+ {"role": "user", "content": "Give me a short introduction to large language model."}
76
+ ],
77
+ "max_tokens": 1024
78
+ }'
79
+
80
+ # "content":
81
+ # "A large language model (LLM) is a type of artificial intelligence system trained on vast amounts of text data to understand, generate, and manipulate human language. These models use deep learning architectures—often based on the transformer network—to predict the next word in a sequence, enabling them to perform tasks like answering questions, writing essays, translating languages, and even coding. LLMs, such as GPT, Gemini, and Claude, learn patterns and relationships in language without explicit programming, allowing them to produce human-like responses across a wide range of topics. While powerful, they don’t “understand” language in the human sense and can sometimes generate plausible-sounding but incorrect or biased information.",
82
+ ```
83
+
84
  ### Generate the model
85
  ```bash
86
  auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --scheme W4A16 --output_dir tmp_autoround