InternVL2-1B โ€” CrispEmbed GGUF

GGUF conversions of OpenGVLab/InternVL2-1B for use with CrispEmbed.

Smallest competitive VLM for OCR โ€” ideal for edge, mobile, and WASM deployment.

Model Details

Property Value
Architecture InternVL2 (InternViT-300M + Qwen2-0.5B)
Total Parameters ~0.9B
Vision Encoder InternViT-300M-448px (24L, 1024d, identical to InternVL2.5-2B)
Projector Pixel unshuffle (4:1) + LayerNorm + Linear + GELU + Linear
LLM Decoder Qwen2-0.5B-Instruct (24L, 896d, GQA 14/2, SwiGLU, RMSNorm)
Input Resolution 448x448 per tile, dynamic tiling (1-12 tiles)
License MIT
OCRBench 779

Available Quantizations

File Size Compression Notes
internvl2-1b-f16.gguf 2.3 GB 1x Full precision
internvl2-1b-q8_0.gguf 955 MB 2.4x Good quality
internvl2-1b-q4_k.gguf ~600 MB ~4x Smallest, vision Q8_0 floor

Parity Verification

All components verified against Python reference (cos=1.000000):

  • Vision encoder: 4/4 layers PASS
  • Projector: PASS
  • LLM decoder (Qwen2): 2/2 layers PASS

Credits

Downloads last month
275
GGUF
Model size
0.9B params
Architecture
internvl2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/internvl2-1b-crispembed-GGUF

Quantized
(5)
this model