---
tags:
- quantized
- 2-bit
- 3-bit
- 4-bit
- 5-bit
- 6-bit
- 8-bit
- GGUF
- text-generation
- mixtral
- text-generation
model_name: Llama-3-16B-Instruct-v0.1-GGUF
base_model: MaziyarPanahi/Llama-3-16B-Instruct-v0.1
inference: false
model_creator: MaziyarPanahi
pipeline_tag: text-generation
quantized_by: MaziyarPanahi
---
# [MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF)
- Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi)
- Original model: [MaziyarPanahi/Llama-3-16B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1)

## Description
[MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF) contains GGUF format model files for [MaziyarPanahi/Llama-3-16B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1).

## Load GGUF models

You `MUST` follow the prompt template provided by Llama-3:


```sh
./llama.cpp/main -m Llama-3-11B-Instruct.Q2_K.gguf -r '<|eot_id|>' --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi! How are you?<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024
```