KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'

by tutu329 - opened Apr 21, 2024

Discussion

tutu329

Apr 21, 2024

vllm 0.4.0 post1 with 8*2080ti 22g which can inference mixtral-8x22b-instruct-v0.1-awq correctly.

jarrelscy

Owner Apr 21, 2024

Try the latest version?

tutu329

Apr 21, 2024

Try the latest version?

using latest vllm 0.4.1 still reports this error

jarrelscy

Owner Apr 22, 2024

Running with vllm 0.3.0 - this works for me.

from vllm import LLM

llm = LLM(
model="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tokenizer="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tensor_parallel_size=8,
)

orel12

Apr 24, 2024

•

edited Apr 24, 2024

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

tutu329

Apr 26, 2024

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

--max-model-len=10000 （try some larger）

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment