How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="alexmultiagent/MiniCPM5-1B-GGUF",
	filename="MiniCPM5-1B-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

MiniCPM5-1B-GGUF (Q4_K_M)

Mirror of openbmb/MiniCPM5-1B-GGUF's MiniCPM5-1B-Q4_K_M.gguf. Used by the IceSpiritAI_Chat Android app (MiniCPM5-1B GGUF backend via llama.cpp; alternative to the default Qwen3.5-2B-MNN LLM).

Identity

Field Value
Source huggingface.co/openbmb/MiniCPM5-1B-GGUF (official)
File MiniCPM5-1B-Q4_K_M.gguf
Size 688,065,920 bytes (656.30 MiB)
SHA-256 81b64d05a23b17b34c475f42b3e72fbde62d4b92cc34541f7a8031d0752deafa
Architecture Standard LlamaForCausalLM (per OpenBMB model card)
Params 1.08B (24 layers, GQA 16+2, ctx 131072)
Tokenizer gpt2 (llama-bpe pre-tokenizer)
Uploaded 2026-06-25

Why this mirror exists

IceSpiritAI_Chat is a dual-LLM Android app. The default LLM is Qwen3.5-2B-MNN (small, fast, on-device MNN); the alternative is MiniCPM5-1B-GGUF (slightly larger, higher-quality generations, served by a llama.cpp native pipeline). Users in mainland China without reliable access to huggingface.co can use this mirror or the ModelScope mirror AlexZh/MiniCPM5-1B-GGUF.

Downloads last month
192
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexmultiagent/MiniCPM5-1B-GGUF

Quantized
(42)
this model