MiniCPM-V 4.6 — 0.8B Abliterated

A tiny vision-language model built by swapping MiniCPM-V 4.6's original Qwen3.5-0.8B backbone with Qwen3.5-0.8B Abliterated, removing refusal behavior while preserving vision capabilities.

⚠️ Experimental

This is an experimental backbone swap. The vision-language projector (vit_merger) was not retrained after the backbone replacement. Since the abliterated backbone has the same hidden dimensions (1024) as the original, the merger weights are compatible and no resizing was needed. Vision tasks may still work but quality has not been extensively benchmarked.

Text-only tasks work well with the abliterated backbone. Vision task quality depends on how much the abliteration process altered the backbone's internal representations.

Specs

Component	Details
Architecture	MiniCPMV4_6ForConditionalGeneration
LLM Backbone	Qwen3.5-0.8B Abliterated (dense)
Hidden Size	1024
LLM Layers	24
Attention	8 heads (2 KV heads), hybrid linear/full
Context Length	262,144 tokens
Vision Encoder	SigLip2-400M (27 layers, hidden=1152)
Vocab Size	248,320
Total Size	~2.7 GB
Precision	BF16
Min VRAM	~4 GB
Quantization	None (fits as-is on edge GPUs)

What Changed

Component	Original MiniCPM-V 4.6	This Model
LLM Backbone	Qwen3.5-0.8B	Qwen3.5-0.8B Abliterated
Merger MLP	Original weights	Original weights (same dims)
Vision Encoder	SigLip2-400M	SigLip2-400M (unchanged)
Refusal Behavior	Standard guardrails	Removed via abliteration

Key Features

Tiny footprint: 2.7GB total, fits in 4GB VRAM (RTX A500, mobile GPUs, Jetson, etc.)
Abliterated: Refusal behavior removed — responds to all queries without artificial restrictions
Same architecture: Drop-in compatible with MiniCPM-V 4.6 tooling and pipelines
Hybrid attention: Mix of linear and full attention layers for efficient long-context processing

Usage

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

model = AutoModel.from_pretrained(
    "jduartedj/MiniCPM-V-4.6-0.8B-Abliterated",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "jduartedj/MiniCPM-V-4.6-0.8B-Abliterated",
    trust_remote_code=True
)

# Image understanding
image = Image.open("example.jpg")
msgs = [{"role": "user", "content": [image, "Describe this image in detail."]}]
result = model.chat(msgs=msgs, tokenizer=tokenizer)
print(result)

# Text-only (abliterated)
msgs = [{"role": "user", "content": "Write a story without restrictions."}]
result = model.chat(msgs=msgs, tokenizer=tokenizer)
print(result)

Limitations

Vision projector not retrained: The vit_merger was kept from the original model. While dimensions match, abliteration may have shifted internal representations enough to degrade vision quality.
No benchmarks: This model has not been formally evaluated on vision-language benchmarks.
Experimental: Use at your own risk. Best suited for research and experimentation.
Small model limitations: As a 0.8B model, reasoning capabilities are inherently limited compared to larger variants.