InternVL2-1B — CrispEmbed GGUF

GGUF conversions of OpenGVLab/InternVL2-1B for use with CrispEmbed.

Smallest competitive VLM for OCR — ideal for edge, mobile, and WASM deployment.

Model Details

Property	Value
Architecture	InternVL2 (InternViT-300M + Qwen2-0.5B)
Total Parameters	~0.9B
Vision Encoder	InternViT-300M-448px (24L, 1024d, identical to InternVL2.5-2B)
Projector	Pixel unshuffle (4:1) + LayerNorm + Linear + GELU + Linear
LLM Decoder	Qwen2-0.5B-Instruct (24L, 896d, GQA 14/2, SwiGLU, RMSNorm)
Input Resolution	448x448 per tile, dynamic tiling (1-12 tiles)
License	MIT
OCRBench	779

File	Size	Compression	Notes
`internvl2-1b-f16.gguf`	2.3 GB	1x	Full precision
`internvl2-1b-q8_0.gguf`	955 MB	2.4x	Good quality
`internvl2-1b-q4_k.gguf`	~600 MB	~4x	Smallest, vision Q8_0 floor

All components verified against Python reference (cos=1.000000):

GGUF

Model size

0.9B params

Architecture

internvl2

Hardware compatibility

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(5)

this model