NobodyWho/LFM2.5-8B-A1B-GGUF

Overview

GGUF quantization of LiquidAI's LFM2.5-8B-A1B model, prepared for NobodyWho: it works with NobodyWho out of the box, with LiquidAI's recommended sampling metadata embedded in every quant. LFM2.5-8B-A1B is a sparse Mixture-of-Experts model (8B total / ≈1B active per token) built on the hybrid LFM2 architecture — the fastest model in its size class on both CPU and GPU. Note: tool calling is unreliable on this model — see the Tool calling note below.

Model Capabilities

  • Text generation — instruction-following chat
  • Tool calling — supported but unreliable: the model often answers in prose instead of invoking a tool. NobodyWho suite: 8/14 (F16, Q8_0), 4/14 (Q4_K_M)
  • Long context — 128k tokens
  • Efficient MoE — 8B total / ≈1B active per token

NobodyWho preparation

The upstream GGUF (built from LiquidAI commit feb5e04) already renders tool calls correctly in the model's native markup — <|tool_call_start|>[get_weather(city="Paris")]<|tool_call_end|> — so nothing needs patching; NobodyWho just verifies it with the test suite. The -vendor-sampling quants additionally embed LiquidAI's recommended sampling settings as general.sampling.* metadata, which NobodyWho reads and applies by default (see core/src/sampler.rs).

Available Quantizations

File Approach Tool-calling tests
LFM2.5-8B-A1B-F16-vendor-sampling.gguf Vendor sampling injected 8/14
LFM2.5-8B-A1B-Q8_0-vendor-sampling.gguf Vendor sampling injected 8/14
LFM2.5-8B-A1B-Q4_K_M-vendor-sampling.gguf Vendor sampling injected 4/14

Tool-calling results from NobodyWho's suite (June 2026). Failures are the model declining to emit a tool call on complex parameter schemas (sets / tuples / nested lists / dicts), not a format error. Vendor sampling does not change the result (verified with and without). The -vendor-sampling suffix marks files that embed general.sampling.* metadata.

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0-vendor-sampling.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/LFM2.5-8B-A1B-GGUF",
    filename="LFM2.5-8B-A1B-Q8_0-vendor-sampling.gguf",
)

Model Specifications

  • Parameters: 8B total / ≈1B active (MoE)
  • Context length: 128,000 tokens
  • License: LFM Open License v1.0
  • Base model: LiquidAI/LFM2.5-8B-A1B
  • Architecture: lfm2moe

Licensing / Credits

Licensed under LFM Open License v1.0 (unchanged from upstream). All model credit belongs to Liquid AI.

Downloads last month
127
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/LFM2.5-8B-A1B-GGUF

Quantized
(51)
this model

Collection including NobodyWho/LFM2.5-8B-A1B-GGUF