Granite Vision 3.3-2B β€” CrispEmbed GGUF

GGUF conversion of ibm-granite/granite-vision-3.3-2b for CrispEmbed.

OCRBench 852 β€” highest score for models ≀3B parameters.

Architecture

  • Vision: SigLIP ViT-SO400M (27 layers, 1152d, 384px, multi-layer features [-24,-20,-12,-1])
  • Projector: 2-layer MLP (4Γ—1152 β†’ 2048)
  • LLM: Granite-3.1-2B (40 layers, 2048d, GQA 32/8)
    • embedding_multiplier=12.0, residual_multiplier=0.22, logits_scaling=8.0

Models

File Quant Size Notes
granite-vision-3.3-2b-f16.gguf F16 5.6 GB Full precision
granite-vision-3.3-2b-q8_0.gguf Q8_0 3.2 GB Best quality/size
granite-vision-3.3-2b-q4_k.gguf Q4_K 1.9 GB Smallest (vision stays F16)
granite-vision-ref.gguf F32 36 MB Reference activations for parity

Parity

Vision encoder + projector verified via crispembed-diff:

  • vis_features_concat: cos_min=0.999990
  • projector: cos_min=0.999972

Usage

crispembed --ocr granite-vision-3.3-2b-q8_0.gguf document.png

License

Apache-2.0 (ibm-granite/granite-vision-3.3-2b)

Downloads last month
49
GGUF
Model size
3B params
Architecture
granite_vision
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/granite-vision-crispembed-GGUF

Quantized
(3)
this model