This is a quick GGUF quantization of DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 (done for testing purposes with an older llama.cpp version without bpe pre-tokenizer fix)

GGUF

Model size

8.03B params

Architecture

llama

4-bit

Inference API

Unable to determine this model's library. Check the docs .