Fix vllm serve command for different versions
Browse files
README.md
CHANGED
|
@@ -86,7 +86,13 @@ Start the embedding server once, then route from any process without reloading t
|
|
| 86 |
```bash
|
| 87 |
# Terminal 1: Start vLLM embedding server (runs once, stays alive)
|
| 88 |
uv pip install vllm
|
|
|
|
|
|
|
| 89 |
vllm serve Qwen/Qwen3-0.6B --task embed --port 8000
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
```python
|
|
|
|
| 86 |
```bash
|
| 87 |
# Terminal 1: Start vLLM embedding server (runs once, stays alive)
|
| 88 |
uv pip install vllm
|
| 89 |
+
|
| 90 |
+
# vLLM >= 0.8
|
| 91 |
vllm serve Qwen/Qwen3-0.6B --task embed --port 8000
|
| 92 |
+
|
| 93 |
+
# vLLM < 0.8 (use this if the above fails)
|
| 94 |
+
python -m vllm.entrypoints.openai.api_server \
|
| 95 |
+
--model Qwen/Qwen3-0.6B --task embed --port 8000
|
| 96 |
```
|
| 97 |
|
| 98 |
```python
|