granite-4.1-3b-vl

Vision-language model trained with a CLIP vision encoder, a learned VisionProjection MLP, and a LoRA-fine-tuned causal LM backbone.

Architecture

Component Detail
Base LLM ibm-granite/granite-4.1-3b (LoRA merged)
CLIP dim ?
LLM dim ?

Files

Path Contents
projector/projector.pt VisionProjection weights + config
llm/ Merged LLM (SafeTensors) + tokenizer

Loading

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import hf_hub_download
from core.models import VisionProjection

REPO = "Issactoto/granite-4.1-3b-vl"

# Projector
ckpt = torch.load(hf_hub_download(REPO, "projector/projector.pt"))
cfg  = ckpt["config"]
projector = VisionProjection(cfg["clip_dim"], cfg["llm_dim"])
projector.load_state_dict(ckpt["state_dict"])
projector.eval()

# LLM + tokenizer
llm = AutoModelForCausalLM.from_pretrained(f"{REPO}/llm")
tokenizer = AutoTokenizer.from_pretrained(f"{REPO}/llm")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support