Tini-8B-A1B-GGUF

This repository contains GGUF format versions of the fine-tuned model Tini-8B-A1B, which is based on the LiquidAI/LFM2.5-8B-A1B architecture and optimized for agent reasoning and function-calling.

Files Available

  • Tini-8B-A1B-BF16.gguf (15.78 GB): Unquantized Brain Float 16 base GGUF file.
  • Tini-8B-A1B-Q8_0.gguf (8.39 GB): 8-bit standard quantization. High accuracy, recommended for general inference.
  • Tini-8B-A1B-Q6_K.gguf (6.48 GB): 6-bit quantization. Good balance between size and perplexity.
  • Tini-8B-A1B-Q4_K_M.gguf (4.80 GB): 4-bit Medium K-quantized model. Highly efficient resource usage.

Running the Model

Since the lfm2_moe architecture is relatively new, make sure to use a recent version of llama.cpp or downstream tools (LM Studio, Ollama, etc.) that support this model type.

Using llama-cli

You can run the model directly using llama-cli:

llama-cli -m Tini-8B-A1B-Q8_0.gguf -p "<|im_start|>user\nHello, how can you help me today?<|im_end|>\n<|im_start|>assistant\n"

Model Architecture Details

  • Architecture: Lfm2MoeForCausalLM
  • Experts: 32 experts (MoE)
  • Experts per Token: 4 active experts
  • Context Window: Up to 128k tokens
Downloads last month
235
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iselabvn/Tini-8B-A1B-GGUF

Quantized
(3)
this model