MiniCPM-V 4.6 — 0.8B Abliterated

A tiny vision-language model built by swapping MiniCPM-V 4.6's original Qwen3.5-0.8B backbone with Qwen3.5-0.8B Abliterated, removing refusal behavior while preserving vision capabilities.

⚠️ Experimental

This is an experimental backbone swap. The vision-language projector (vit_merger) was not retrained after the backbone replacement. Since the abliterated backbone has the same hidden dimensions (1024) as the original, the merger weights are compatible and no resizing was needed. Vision tasks may still work but quality has not been extensively benchmarked.

Text-only tasks work well with the abliterated backbone. Vision task quality depends on how much the abliteration process altered the backbone's internal representations.

Specs

Component Details
Architecture MiniCPMV4_6ForConditionalGeneration
LLM Backbone Qwen3.5-0.8B Abliterated (dense)
Hidden Size 1024
LLM Layers 24
Attention 8 heads (2 KV heads), hybrid linear/full
Context Length 262,144 tokens
Vision Encoder SigLip2-400M (27 layers, hidden=1152)
Vocab Size 248,320
Total Size ~2.7 GB
Precision BF16
Min VRAM ~4 GB
Quantization None (fits as-is on edge GPUs)

What Changed

Component Original MiniCPM-V 4.6 This Model
LLM Backbone Qwen3.5-0.8B Qwen3.5-0.8B Abliterated
Merger MLP Original weights Original weights (same dims)
Vision Encoder SigLip2-400M SigLip2-400M (unchanged)
Refusal Behavior Standard guardrails Removed via abliteration

Key Features

  • Tiny footprint: 2.7GB total, fits in 4GB VRAM (RTX A500, mobile GPUs, Jetson, etc.)
  • Abliterated: Refusal behavior removed — responds to all queries without artificial restrictions
  • Same architecture: Drop-in compatible with MiniCPM-V 4.6 tooling and pipelines
  • Hybrid attention: Mix of linear and full attention layers for efficient long-context processing

Usage

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

model = AutoModel.from_pretrained(
    "jduartedj/MiniCPM-V-4.6-0.8B-Abliterated",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "jduartedj/MiniCPM-V-4.6-0.8B-Abliterated",
    trust_remote_code=True
)

# Image understanding
image = Image.open("example.jpg")
msgs = [{"role": "user", "content": [image, "Describe this image in detail."]}]
result = model.chat(msgs=msgs, tokenizer=tokenizer)
print(result)

# Text-only (abliterated)
msgs = [{"role": "user", "content": "Write a story without restrictions."}]
result = model.chat(msgs=msgs, tokenizer=tokenizer)
print(result)

Limitations

  • Vision projector not retrained: The vit_merger was kept from the original model. While dimensions match, abliteration may have shifted internal representations enough to degrade vision quality.
  • No benchmarks: This model has not been formally evaluated on vision-language benchmarks.
  • Experimental: Use at your own risk. Best suited for research and experimentation.
  • Small model limitations: As a 0.8B model, reasoning capabilities are inherently limited compared to larger variants.

Credits

Downloads last month
31
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jduartedj/MiniCPM-V-4.6-0.8B-Abliterated

Finetuned
(1)
this model
Quantizations
2 models

Collection including jduartedj/MiniCPM-V-4.6-0.8B-Abliterated