InternVL2-1B โ CrispEmbed GGUF
GGUF conversions of OpenGVLab/InternVL2-1B for use with CrispEmbed.
Smallest competitive VLM for OCR โ ideal for edge, mobile, and WASM deployment.
Model Details
| Property | Value |
|---|---|
| Architecture | InternVL2 (InternViT-300M + Qwen2-0.5B) |
| Total Parameters | ~0.9B |
| Vision Encoder | InternViT-300M-448px (24L, 1024d, identical to InternVL2.5-2B) |
| Projector | Pixel unshuffle (4:1) + LayerNorm + Linear + GELU + Linear |
| LLM Decoder | Qwen2-0.5B-Instruct (24L, 896d, GQA 14/2, SwiGLU, RMSNorm) |
| Input Resolution | 448x448 per tile, dynamic tiling (1-12 tiles) |
| License | MIT |
| OCRBench | 779 |
Available Quantizations
| File | Size | Compression | Notes |
|---|---|---|---|
internvl2-1b-f16.gguf |
2.3 GB | 1x | Full precision |
internvl2-1b-q8_0.gguf |
955 MB | 2.4x | Good quality |
internvl2-1b-q4_k.gguf |
~600 MB | ~4x | Smallest, vision Q8_0 floor |
Parity Verification
All components verified against Python reference (cos=1.000000):
- Vision encoder: 4/4 layers PASS
- Projector: PASS
- LLM decoder (Qwen2): 2/2 layers PASS
Credits
- Original model: OpenGVLab/InternVL2-1B (MIT)
- GGUF conversion: CrispEmbed
- Downloads last month
- 275
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for cstr/internvl2-1b-crispembed-GGUF
Base model
OpenGVLab/InternVL2-1B