Is there any vllm support for this version?

#49

by Aloukik21 - opened Feb 14

Feb 14

ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (15424). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

ybelkada

Feb 14

Hi @Aloukik21 !
I think this model should be supported in vllm - feel free to open an issue directly on their repository: https://github.com/vllm-project/vllm

Navanit-AI

Mar 5

@Aloukik21 are you able to resolve it.
I am also stuck in this

EricMMD

Mar 6

@Navanit-shorthills
If thats helps for me it works with this in the "offline_inference.py":
llm = LLM(model="mistralai/Mistral-7B-v0.1", max_model_len=20000, gpu_memory_utilization=0.9 ), depending what GPU ure using you can set the len.

Navanit-AI

Mar 6

@EricMMD Thank you

silvacarl

Apr 4

is max_model_len=20000 arbitrary or just simple the max number of tokens i cen expect to inference?

dfrank

Apr 14

I checked the documentation on this parameter and it says: model context length. If unspecified will be automatically derived from the model

junli8848

Apr 30

•

edited Apr 30

I have the same issue when using the vllm docker container to run the model.
Is there a way to specify the argument gpu_memory_utilization=0.9 in the vllm's docker command? When I execute the docker command:

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model mistralai/Mistral-7B-Instruct-v0.2  --gpu_memory_utilization 0.9

Then I got the errors as below:

api_server.py: error: unrecognized arguments: --gpu_memory_utilization 0.9

Aloukik21

Apr 30

•

edited Apr 30

this is the correct argument for cli: --gpu-memory-utilization

junli8848

Apr 30

@Aloukik21 Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment