Instructions to use cyankiwi/Step-3.7-Flash-AWQ-INT4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cyankiwi/Step-3.7-Flash-AWQ-INT4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="cyankiwi/Step-3.7-Flash-AWQ-INT4", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("cyankiwi/Step-3.7-Flash-AWQ-INT4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cyankiwi/Step-3.7-Flash-AWQ-INT4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cyankiwi/Step-3.7-Flash-AWQ-INT4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyankiwi/Step-3.7-Flash-AWQ-INT4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/cyankiwi/Step-3.7-Flash-AWQ-INT4
- SGLang
How to use cyankiwi/Step-3.7-Flash-AWQ-INT4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cyankiwi/Step-3.7-Flash-AWQ-INT4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyankiwi/Step-3.7-Flash-AWQ-INT4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cyankiwi/Step-3.7-Flash-AWQ-INT4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyankiwi/Step-3.7-Flash-AWQ-INT4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use cyankiwi/Step-3.7-Flash-AWQ-INT4 with Docker Model Runner:
docker model run hf.co/cyankiwi/Step-3.7-Flash-AWQ-INT4
how to use it
vllm or sglang?only step 3.7 flash customized can run?
got it working on nightly vllm.
step customized vllm release has asymmetric memory related error on a6000 ampere.
mtp did not work.
I got it working with following command on my 8x RTX3090 Rig:
docker run -d
--name vllm-stepf37-8gpus
--restart unless-stopped
-p 8788:8000
-v /mnt/extra/models:/root/.cache/huggingface
--gpus '"device=0,1,2,4,5,6,7,8"'
-e CUDA_DEVICE_ORDER=PCI_BUS_ID
-e VLLM_ENABLE_CUDAGRAPH_GC=1
--ipc=host
--ulimit memlock=-1
--ulimit stack=67108864
vllm/vllm-openai:nightly
cyankiwi/Step-3.7-Flash-AWQ-INT4
--served-model-name "MainLLM"
--tensor-parallel-size 8
--max-model-len 250000
--gpu-memory-utilization 0.92
--trust-remote-code
--enable-expert-parallel
--disable-cascade-attn
--tool-call-parser step3p5
--enable-auto-tool-choice
--kv-cache-dtype auto
--max-num-seqs 4
--reasoning-parser step3p5
But as Wisfor wrote, it crashs as soon I try to use MTP
--speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' \
There don't seem to be MTP weights
Thanks for letting me know. The MTP problem is partially-fixed now :)
MTP can now run with vllm, but with low draft acceptance rate. MTP layers are quantized into INT4 rounding-to-the-next, as there is a bug that prevents vllm loading stepfun3_7 BF16 MTP layers.
Thanks for letting me know. The MTP problem is partially-fixed now :)
MTP can now run with vllm, but with low draft acceptance rate. MTP layers are quantized into INT4 rounding-to-the-next, as there is a bug that prevents vllm loading stepfun3_7 BF16 MTP layers.
Sorry with the updated model variant I still cant get MTP to work. I crashs at loading.
I used following command:
docker run -d
--name vllm-stepf37-8gpus
--restart unless-stopped
-p 8788:8000
-v /mnt/extra/models:/root/.cache/huggingface
--gpus '"device=0,1,2,4,5,6,7,8"'
-e CUDA_DEVICE_ORDER=PCI_BUS_ID
-e VLLM_ENABLE_CUDAGRAPH_GC=1
-e VLLM_USE_FLASHINFER_SAMPLER=1
-e VLLM_TEST_FORCE_FP8_MARLIN=1
-e VLLM_RPC_TIMEOUT=180
-e VLLM_WORKER_MULTIPROC_METHOD=spawn
--ipc=host
--ulimit memlock=-1
--ulimit stack=67108864
vllm/vllm-openai:nightly
cyankiwi/Step-3.7-Flash-AWQ-INT4
--served-model-name "MainLLM"
--tensor-parallel-size 8
--max-model-len 250000
--gpu-memory-utilization 0.92
--trust-remote-code
--enable-expert-parallel
--disable-cascade-attn
--tool-call-parser step3p5
--enable-auto-tool-choice
--kv-cache-dtype auto
--max-num-seqs 4
--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
--reasoning-parser step3p5
What is the error that you are getting?
Thanks for the quant. I'm getting this error when adding the MTP config option:
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] self.load_weights(model, model_config)
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] return func(*args, **kwargs)
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] ^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] raise RuntimeError(
vllm-1 | (Worker_TP1_EP1 pid=352) ERROR 06-13 10:59:31 [multiproc_executor.py:890] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.q_zero_point are not in the checkpoint and will falsely use random initialization
By the way, for those interested, this is sample command for docker compose.
From a quick test it seems to work well without the MTP option:
command: |
cyankiwi/Step-3.7-Flash-AWQ-INT4
--served-model-name Step-3.7-Flash
--port 8000
--tensor-parallel-size 8
--max-model-len 262144
--gpu-memory-utilization 0.95
--enable-auto-tool-choice
--trust-remote-code
--enable-prefix-caching
--enable-flashinfer-autotune
--enable-expert-parallel
--disable-cascade-attn
--tool-call-parser step3p5
--reasoning-parser step3p5
--max-num-seqs 4
--max-num-batched-tokens 8192
--compilation-config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}'
--enable-chunked-prefill
--kv-cache-dtype auto
###--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
What is the error that you are getting?
Big error log:
Loading safetensors checkpoint shards: 100% Completed | 26/26 [00:16<00:00, 1.60it/s]
(Worker_TP0_EP0 pid=248)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP2_EP2 pid=260) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.45.mtp_block.self_attn.attn.v_scale are not in the checkpoint and will falsely use random initialization
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP1_EP1 pid=253) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.q_zero_point are not in the checkpoint and will falsely use random initialization
(EngineCore pid=192) INFO 06-13 19:41:23 [multiproc_executor.py:428] [shutdown] Executor: waiting for worker exit count=8
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP0_EP0 pid=248) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.q_zero_point are not in the checkpoint and will falsely use random initialization
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP3_EP3 pid=273) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.46.mtp_block.self_attn.attn.k_scale are not in the checkpoint and will falsely use random initialization
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP6_EP6 pid=312) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.v_scale are not in the checkpoint and will falsely use random initialization
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP7_EP7 pid=325) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.45.mtp_block.self_attn.attn.k_scale are not in the checkpoint and will falsely use random initialization
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP5_EP5 pid=299) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.q_zero_point are not in the checkpoint and will falsely use random initialization
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] WorkerProc failed to start.
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] Traceback (most recent call last):
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 855, in worker_main
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] worker = WorkerProc(*args, **kwargs)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 634, in __init__
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.worker.load_model()
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 356, in load_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5113, in load_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.drafter.load_model(self.model)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1210, in load_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.model = self._get_model()
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/llm_base_proposer.py", line 1195, in _get_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] model = get_model(
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 143, in get_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return loader.load_model(
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] self.load_weights(model, model_config)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] return func(*args, **kwargs)
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 394, in load_weights
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/step3p5_mtp.py", line 289, in load_weights
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] raise RuntimeError(
(Worker_TP4_EP4 pid=286) ERROR 06-13 19:41:23 [multiproc_executor.py:888] RuntimeError: Some parameters like model.layers.47.mtp_block.self_attn.attn.v_scale are not in the checkpoint and will falsely use random initialization
[rank0]:[W613 19:41:24.065854307 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(EngineCore pid=192) INFO 06-13 19:41:27 [multiproc_executor.py:433] [shutdown] Executor: all workers exited gracefully
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] EngineCore failed to start.
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] Traceback (most recent call last):
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1164, in run_engine_core
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] return func(*args, **kwargs)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) Process EngineCore:
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 930, in __init__
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] super().__init__(
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 122, in __init__
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] self.model_executor = executor_class(vllm_config)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 108, in __init__
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] super().__init__(vllm_config)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] return func(*args, **kwargs)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] self._init_executor()
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 201, in _init_executor
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 762, in wait_for_ready
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] raise e from None
(EngineCore pid=192) ERROR 06-13 19:41:27 [core.py:1195] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore pid=192) Traceback (most recent call last):
(EngineCore pid=192) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=192) self.run()
(EngineCore pid=192) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=192) self._target(*self._args, **self._kwargs)
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1199, in run_engine_core
(EngineCore pid=192) raise e
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1164, in run_engine_core
(EngineCore pid=192) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=192) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=192) return func(*args, **kwargs)
(EngineCore pid=192) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 930, in __init__
(EngineCore pid=192) super().__init__(
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 122, in __init__
(EngineCore pid=192) self.model_executor = executor_class(vllm_config)
(EngineCore pid=192) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 108, in __init__
(EngineCore pid=192) super().__init__(vllm_config)
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=192) return func(*args, **kwargs)
(EngineCore pid=192) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=192) self._init_executor()
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 201, in _init_executor
(EngineCore pid=192) self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=192) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=192) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 762, in wait_for_ready
(EngineCore pid=192) raise e from None
(EngineCore pid=192) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 95, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 148, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 672, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 686, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 99, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 135, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 131, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 948, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 570, in init
(APIServer pid=1) with launch_core_engines(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1190, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1249, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '