Instructions to use Quatfit/Quatfit-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Quatfit/Quatfit-Mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Quatfit/Quatfit-Mini", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Quatfit/Quatfit-Mini", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Quatfit/Quatfit-Mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Quatfit/Quatfit-Mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Quatfit/Quatfit-Mini

SGLang

How to use Quatfit/Quatfit-Mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Quatfit/Quatfit-Mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Quatfit/Quatfit-Mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Quatfit/Quatfit-Mini with Docker Model Runner:
```
docker model run hf.co/Quatfit/Quatfit-Mini
```

🚀 Quatfit Mini

Fast • Compact • Multimodal • Long Context • Agentic

📄 Technical Report

Quatfit Mini is an 8-billion-parameter multimodal foundation model developed by Quatfit AI Research.

Built for practical intelligence, Quatfit Mini combines advanced reasoning, multimodal understanding, coding capabilities, long-context processing, and agentic tool use in an efficient architecture optimized for real-world deployment.

Supporting 131K context, native vision and audio understanding, and up to 4× faster inference than conventional 8B models, Quatfit Mini delivers frontier-level capabilities while remaining accessible on consumer hardware.

✨ Key Features

🧠 Native Multimodal Architecture
⚡ Up to 4× Faster Inference
📚 131K Token Context Window
💻 Strong Coding Performance
🖼️ Vision Understanding
🎙️ Audio Understanding
🤖 Agentic Tool Calling
🪶 Consumer GPU Optimized
🌍 Multilingual

📊 Performance Highlights

Benchmark	Score
Overall Accuracy	89.08%
Coding	92.5%
Science	91.7%
Agentic Tasks	92.5%
CLI	95.0%
Exams	93.3%
Finance	90.0%
Social Intelligence	90.0%

🏗 Architecture

Quatfit Mini is built on the Quatfit 1 Architecture, engineered for efficient multimodal intelligence.

Language Model

Component	Value
Parameters	8B
Layers	42
Hidden Size	2560
Attention Heads	8
KV Heads	2
Shared KV Layers	18
Feed Forward	GeGLU
Precision	BF16
Vocabulary	262K
Context Length	131,072

Vision Encoder

Vision Transformer
16 Transformer Layers
280 Visual Tokens
Patch Size: 16×16
Pan & Scan High-Resolution Support

Audio Encoder

Conformer Architecture
12 Layers
Streaming Compatible
Causal Chunk Attention

⚡ Performance Optimizations

Quatfit Mini integrates multiple inference optimizations, including:

Flash Attention 3
Sliding Window Attention
Grouped Query Attention (GQA)
KV Cache Sharing
Speculative Decoding
GGUF Quantization

Inference Speed

Configuration	Relative Speed
Standard 8B Model	1×
Quatfit Mini BF16	2.5×
BF16 + Speculative Decoding	3.9×
GGUF Q4_K_M	4.1×

📈 Benchmark Breakdown

Domain	Accuracy
Coding	92.5%
Science	91.7%
Agentic Tasks	92.5%
CLI	95.0%
Finance	90.0%
Security	90.0%
Reasoning	88.9%
Expert Knowledge	83.8%
Mathematics	81.3%

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "Quatfit/Quatfit-Mini",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "Quatfit/Quatfit-Mini"
)

💬 Example

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Explain this image."
            },
            {
                "type": "image",
                "image": "example.jpg"
            }
        ]
    }
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(processor.decode(outputs[0]))

💻 GGUF Support

Optimized GGUF builds are available for:

llama.cpp
Ollama
LM Studio
Jan
Open WebUI

Recommended Quantizations

Quantization	Approx. VRAM
Q4_K_M	~5 GB
Q5_K_M	~6 GB
Q6_K	~7 GB
Q8_0	~9 GB

🎯 Recommended Applications

Quatfit Mini is designed for practical AI systems, including:

AI Assistants
Agentic AI
Workflow Automation
Tool Calling
Research Copilots
Long-Document Analysis
OCR
Vision-Language Tasks
Audio Understanding
Information Retrieval
General Chat
MVP Software Development

📚 Training

Quatfit Mini was trained on approximately 10 trillion tokens, including:

Web Data
Programming Code
Mathematics
Scientific Literature
Wikipedia
Books
Multilingual Data
Image-Text Pairs
Audio Transcriptions

Post-training

Supervised Fine-Tuning (SFT)
Reinforcement Learning from Human Feedback (RLHF)
Constitutional AI Alignment

🌟 Core Strengths

✅ Agentic AI
✅ Long-Context Reasoning
✅ Tool Use
✅ Coding Assistance
✅ Vision Understanding
✅ Audio Understanding
✅ Scientific Knowledge
✅ Multilingual Intelligence

🎯 Intended Use

Quatfit Mini is an 8B multimodal foundation model primarily optimized for agentic AI applications.

It excels at:

Multi-step reasoning
Autonomous workflows
Tool orchestration
Long-context understanding
Research assistance
Document analysis
Vision-language tasks
Audio understanding
Productivity automation

While Quatfit Mini delivers strong programming performance, it is designed as a general-purpose reasoning model rather than a specialized coding model.

It performs well for:

Code generation
Debugging
API development
Script writing
Code explanation
MVP application development

⚠️ Limitations

Quatfit Mini prioritizes reasoning, multimodal intelligence, and agentic capabilities over benchmark-focused coding performance.

Although highly capable for everyday software development, it is not specifically optimized for:

Repository-scale software engineering
Competitive programming
Enterprise-scale refactoring
Performance-critical code synthesis

As with all foundation models, outputs should be reviewed before deployment in production or safety-critical environments.

📖 Citation

@article{quatfitmini2026,
  title={Quatfit Mini: A Compact Multimodal Foundation Model with Up to 4× Faster Inference},
  author={Quatfit AI Research},
  year={2026}
}

📜 License

Quatfit Mini is released under the Quatfit Non-Commercial License v1.

Commercial licensing is available through Quatfit AI Research.

🌍 Quatfit AI Research

Building practical AI systems that think, reason, create, and collaborate.

Performance First • Practical Intelligence • Open Innovation

⭐ If Quatfit Mini helps your work, consider starring the repository and sharing your projects with the community.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

F32

Evaluation results

Overall Accuracy on Internal Evaluation Suite (815 Questions / 32 Categories)
self-reported

89.080