CompoDistill-Teacher-4B

The teacher MLLM (Qwen1.5-4B + SigLIP-so400m) trained with LLaVA-style visual instruction tuning. Serves as the distillation teacher for CompoDistill-2B; can be passed directly to scripts/train/dpt.sh / dft.sh via --pretrained_teacher_model_path.

Released with the paper CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs (arXiv:2510.12184). Training and evaluation code: https://github.com/ptkjw1997/CompoDistill

Usage

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoImageProcessor

repo = "JiwanKim/CompoDistill-Teacher-4B"
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True,
                                             torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(repo, use_fast=False)
image_processor = AutoImageProcessor.from_pretrained(repo)

image = Image.open("example.jpg")
print(model.chat("What is happening in this image?", tokenizer,
                 image=image, image_processor=image_processor))

Citation

@article{kim2025compodistill,
  title={CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs},
  author={Kim, Jiwan and Kim, Kibum and Seo, Sangwoo and Park, Chanyoung},
  journal={arXiv preprint arXiv:2510.12184},
  year={2025}
}

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

F16

Paper for JiwanKim/CompoDistill-Teacher-4B

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs

Paper • 2510.12184 • Published Apr 9