Tini1.5-8B-A1B-GGUF

This repository contains GGUF format versions of the fine-tuned model Tini1.5-8B-A1B, which is based on the LiquidAI/LFM2.5-8B-A1B architecture and optimized for agent reasoning and function-calling.

Files Available

  • Tini1.5-8B-A1B-BF16.gguf (16.95 GB): Unquantized Brain Float 16 base GGUF file.
  • Tini1.5-8B-A1B-Q8_0.gguf (9.01 GB): 8-bit standard quantization. High accuracy, recommended for general inference.
  • Tini1.5-8B-A1B-Q6_K.gguf (6.96 GB): 6-bit quantization. Good balance between size and perplexity.
  • Tini1.5-8B-A1B-Q4_K_M.gguf (5.16 GB): 4-bit Medium K-quantized model. Highly efficient resource usage.

Running the Model

Since the lfm2_moe architecture is relatively new, make sure to use a recent version of llama.cpp or downstream tools (LM Studio, Ollama, etc.) that support this model type.

Using llama-cli

You can run the model directly using llama-cli:

llama-cli -m Tini1.5-8B-A1B-Q8_0.gguf -p "<|im_start|>user\nHello, how can you help me today?<|im_end|>\n<|im_start|>assistant\n"

Model Architecture Details

  • Architecture: Lfm2MoeForCausalLM
  • Experts: 32 experts (MoE)
  • Experts per Token: 4 active experts
  • Context Window: Up to 128k tokens
Downloads last month
-
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for iselabvn/Tini1.5-8B-A1B-GGUF

Quantized
(1)
this model