LFM2-8B-A1B-Instruct-QwenDistill - GGUF

This repository contains GGUF format quantizations of the model DavidAU/LFM2-8B-A1B-Instruct-Quantum-IQ1C-Qwen3.6-35B-A3B-DISTILL.

Quantization was performed locally using llama.cpp (build b9433).

Files Available

  • LFM2-8B-A1B-Instruct-QwenDistill-Q5_K_M.gguf (~5.64 GB) : Recommended balance. High accuracy, very low performance loss.
  • LFM2-8B-A1B-Instruct-QwenDistill-Q4_K_M.gguf (~4.81 GB) : Standard quantization. Fastest inference speed, smaller memory footprint.

Performance Test (Local CPU Inference)

The following benchmarks were obtained running the model locally on a standard laptop configuration, without GPU acceleration.

Hardware Specifications:

  • CPU: Intel Core i5 (11th Gen)
  • RAM: 32 GB
  • OS: Windows

Inference Results (llama-cli.exe):

  • LFM2-8B-A1B-Instruct-QwenDistill-Q4_K_M.gguf :

    • Prompt Evaluation: ~41 t/s
    • Token Generation: ~24 t/s
  • LFM2-8B-A1B-Instruct-QwenDistill-Q5_K_M.gguf :

    • Prompt Evaluation: ~28 t/s
    • Token Generation: ~20 t/s
Downloads last month
69
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tivaphraen/LFM2-8B-A1B-Instruct-QwenDistill

Collection including Tivaphraen/LFM2-8B-A1B-Instruct-QwenDistill