NousResearch/Hermes-3-Llama-3.1-70B-GGUF

Hugging Face

vLLM support, GGUF

by dpkirchner - opened Sep 3, 2024

Discussion

dpkirchner

Sep 3, 2024

•

edited Sep 3, 2024

According to the model card:

You can also run this model with vLLM, by running the following in your terminal after pip install vllm

vllm serve NousResearch/Hermes-3-Llama-3.1-70B

I assume this is just carried over from the base model README. How do you load, say, the Q4 M gguf with vllm and then use it on the chat completions endpoint?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment