--- tags: - quantized - 2-bit - 3-bit - 4-bit - 5-bit - 6-bit - 8-bit - GGUF - text-generation - mixtral - text-generation model_name: Llama-3-16B-Instruct-v0.1-GGUF base_model: MaziyarPanahi/Llama-3-16B-Instruct-v0.1 inference: false model_creator: MaziyarPanahi pipeline_tag: text-generation quantized_by: MaziyarPanahi --- # [MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF) - Model creator: [MaziyarPanahi](https://huggingface.co/MaziyarPanahi) - Original model: [MaziyarPanahi/Llama-3-16B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1) ## Description [MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1-GGUF) contains GGUF format model files for [MaziyarPanahi/Llama-3-16B-Instruct-v0.1](https://huggingface.co/MaziyarPanahi/Llama-3-16B-Instruct-v0.1). ## Load GGUF models You `MUST` follow the prompt template provided by Llama-3: ```sh ./llama.cpp/main -m Llama-3-11B-Instruct.Q2_K.gguf -r '<|eot_id|>' --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi! How are you?<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024 ```