🚀 Quatfit Mini

Fast • Compact • Multimodal • Long Context • Agentic

📄 Technical Report


Quatfit Mini is an 8-billion-parameter multimodal foundation model developed by Quatfit AI Research.

Built for practical intelligence, Quatfit Mini combines advanced reasoning, multimodal understanding, coding capabilities, long-context processing, and agentic tool use in an efficient architecture optimized for real-world deployment.

Supporting 131K context, native vision and audio understanding, and up to 4× faster inference than conventional 8B models, Quatfit Mini delivers frontier-level capabilities while remaining accessible on consumer hardware.


✨ Key Features

  • 🧠 Native Multimodal Architecture
  • ⚡ Up to 4× Faster Inference
  • 📚 131K Token Context Window
  • 💻 Strong Coding Performance
  • 🖼️ Vision Understanding
  • 🎙️ Audio Understanding
  • 🤖 Agentic Tool Calling
  • 🪶 Consumer GPU Optimized
  • 🌍 Multilingual

📊 Performance Highlights

Benchmark Score
Overall Accuracy 89.08%
Coding 92.5%
Science 91.7%
Agentic Tasks 92.5%
CLI 95.0%
Exams 93.3%
Finance 90.0%
Social Intelligence 90.0%

🏗 Architecture

Quatfit Mini is built on the Quatfit 1 Architecture, engineered for efficient multimodal intelligence.

Language Model

Component Value
Parameters 8B
Layers 42
Hidden Size 2560
Attention Heads 8
KV Heads 2
Shared KV Layers 18
Feed Forward GeGLU
Precision BF16
Vocabulary 262K
Context Length 131,072

Vision Encoder

  • Vision Transformer
  • 16 Transformer Layers
  • 280 Visual Tokens
  • Patch Size: 16×16
  • Pan & Scan High-Resolution Support

Audio Encoder

  • Conformer Architecture
  • 12 Layers
  • Streaming Compatible
  • Causal Chunk Attention

⚡ Performance Optimizations

Quatfit Mini integrates multiple inference optimizations, including:

  • Flash Attention 3
  • Sliding Window Attention
  • Grouped Query Attention (GQA)
  • KV Cache Sharing
  • Speculative Decoding
  • GGUF Quantization

Inference Speed

Configuration Relative Speed
Standard 8B Model
Quatfit Mini BF16 2.5×
BF16 + Speculative Decoding 3.9×
GGUF Q4_K_M 4.1×

📈 Benchmark Breakdown

Domain Accuracy
Coding 92.5%
Science 91.7%
Agentic Tasks 92.5%
CLI 95.0%
Finance 90.0%
Security 90.0%
Reasoning 88.9%
Expert Knowledge 83.8%
Mathematics 81.3%

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "Quatfit/Quatfit-Mini",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "Quatfit/Quatfit-Mini"
)

💬 Example

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Explain this image."
            },
            {
                "type": "image",
                "image": "example.jpg"
            }
        ]
    }
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(processor.decode(outputs[0]))

💻 GGUF Support

Optimized GGUF builds are available for:

  • llama.cpp
  • Ollama
  • LM Studio
  • Jan
  • Open WebUI

Recommended Quantizations

Quantization Approx. VRAM
Q4_K_M ~5 GB
Q5_K_M ~6 GB
Q6_K ~7 GB
Q8_0 ~9 GB

🎯 Recommended Applications

Quatfit Mini is designed for practical AI systems, including:

  • AI Assistants
  • Agentic AI
  • Workflow Automation
  • Tool Calling
  • Research Copilots
  • Long-Document Analysis
  • OCR
  • Vision-Language Tasks
  • Audio Understanding
  • Information Retrieval
  • General Chat
  • MVP Software Development

📚 Training

Quatfit Mini was trained on approximately 10 trillion tokens, including:

  • Web Data
  • Programming Code
  • Mathematics
  • Scientific Literature
  • Wikipedia
  • Books
  • Multilingual Data
  • Image-Text Pairs
  • Audio Transcriptions

Post-training

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning from Human Feedback (RLHF)
  • Constitutional AI Alignment

🌟 Core Strengths

  • ✅ Agentic AI
  • ✅ Long-Context Reasoning
  • ✅ Tool Use
  • ✅ Coding Assistance
  • ✅ Vision Understanding
  • ✅ Audio Understanding
  • ✅ Scientific Knowledge
  • ✅ Multilingual Intelligence

🎯 Intended Use

Quatfit Mini is an 8B multimodal foundation model primarily optimized for agentic AI applications.

It excels at:

  • Multi-step reasoning
  • Autonomous workflows
  • Tool orchestration
  • Long-context understanding
  • Research assistance
  • Document analysis
  • Vision-language tasks
  • Audio understanding
  • Productivity automation

While Quatfit Mini delivers strong programming performance, it is designed as a general-purpose reasoning model rather than a specialized coding model.

It performs well for:

  • Code generation
  • Debugging
  • API development
  • Script writing
  • Code explanation
  • MVP application development

⚠️ Limitations

Quatfit Mini prioritizes reasoning, multimodal intelligence, and agentic capabilities over benchmark-focused coding performance.

Although highly capable for everyday software development, it is not specifically optimized for:

  • Repository-scale software engineering
  • Competitive programming
  • Enterprise-scale refactoring
  • Performance-critical code synthesis

As with all foundation models, outputs should be reviewed before deployment in production or safety-critical environments.


📖 Citation

@article{quatfitmini2026,
  title={Quatfit Mini: A Compact Multimodal Foundation Model with Up to 4× Faster Inference},
  author={Quatfit AI Research},
  year={2026}
}

📜 License

Quatfit Mini is released under the Quatfit Non-Commercial License v1.

Commercial licensing is available through Quatfit AI Research.


🌍 Quatfit AI Research

Building practical AI systems that think, reason, create, and collaborate.

Performance First • Practical Intelligence • Open Innovation

⭐ If Quatfit Mini helps your work, consider starring the repository and sharing your projects with the community.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Evaluation results

  • Overall Accuracy on Internal Evaluation Suite (815 Questions / 32 Categories)
    self-reported
    89.080