Usage (llama-cli with GPU):

llama-cli -m ./gemma-1.1-7b-it-Q6_K.gguf -ngl 100 --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"

Usage (llama-cli with CPU):

llama-cli -m ./gemma-1.1-7b-it-Q6_K.gguf --temp 0 --repeat-penalty 1.0 --color -p "Why is the sky blue?"

Usage (llama-cpp-python via Hugging Face Hub):

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="chenghenry/gemma-1.1-7b-it-GGUF",
    filename="gemma-1.1-7b-it-Q6_K.gguf",
    n_ctx=8192,
    n_batch=2048,
    n_gpu_layers=100,
    verbose=False,
    chat_format="gemma"
)

prompt = "Why is the sky blue?"

messages = [{"role": "user", "content": prompt}]
response = llm.create_chat_completion(
    messages=messages,
    repeat_penalty=1.0,
    temperature=0)

print(response["choices"][0]["message"]["content"])
Downloads last month
4
GGUF
Model size
8.54B params
Architecture
gemma

4-bit

6-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chenghenry/gemma-1.1-7b-it-GGUF

Quantized
(6)
this model