KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'

#2
by tutu329 - opened

vllm 0.4.0 post1 with 8*2080ti 22g which can inference mixtral-8x22b-instruct-v0.1-awq correctly.

Try the latest version?

Try the latest version?

using latest vllm 0.4.1 still reports this error

Running with vllm 0.3.0 - this works for me.

from vllm import LLM

llm = LLM(
model="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tokenizer="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tensor_parallel_size=8,
)

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

--max-model-len=10000 (try some larger)

Sign up or log in to comment