KeyError: 'model.layers.0.block_sparse_moe.experts.0.w1.g_idx' when running with tensor parallelism on vllm

#1
by KronusCon - opened

python3 -m vllm.entrypoints.openai.api_server --trust-remote-code --model="TheBloke/dolphin-2.6-mixtral-8x7b-AWQ" --dtype half --quantization awq --tensor-parallel-size 2

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 729, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 496, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 269, in init
self.engine = self._init_engine(*args, **kwargs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 314, in _init_engine
return engine_class(*args, **kwargs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 108, in init
self._init_workers_ray(placement_group)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 195, in _init_workers_ray
self._run_workers(
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 755, in _run_workers
self._run_workers_in_batch(workers, method, *args, **kwargs))
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 732, in _run_workers_in_batch
all_outputs = ray.get(all_outputs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(KeyError): ray::RayWorkerVllm.execute_method() (pid=37680, ip=216.153.52.143, actor_id=03dd8d10b8ec311e8896b66101000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7fb0e6d68430>)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 31, in execute_method
return executor(*args, **kwargs)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 79, in load_model
self.model_runner.load_model()
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 57, in load_model
self.model = get_model(self.model_config)
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 72, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/home/dev1@gmail.com/.local/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 430, in load_weights
param = params_dict[name]
KeyError: 'model.layers.0.block_sparse_moe.experts.0.w1.g_idx'

Can you please suggest how can I unblock. Thanks.

I also encountered ValueError: Unrecognized layer: Model.Layers.0.block_sparse_moe.expert.0.w1.bias when using text-generation-webui

Sign up or log in to comment