DeepSeek-OCR-2 CrispEmbed GGUF
DeepSeek-OCR-2 (3.4B MoE) converted to GGUF for OCR with CrispEmbed.
Models
| File |
Quant |
Size |
deepseek-ocr2-f16.gguf |
F16 |
~6.5 GB |
Architecture
- Vision Encoder: SAM-ViT-B (12 layers, 768d, windowed + global attention)
- Visual Encoder: Qwen2-0.5B used bidirectionally (24 layers, 896d)
- Projector: Linear(896, 1280)
- LLM Decoder: DeepSeek-V2 MoE (12 layers, 1280d)
- Layer 0: Dense SwiGLU FFN (intermediate=6848)
- Layers 1-11: 64 routed experts (top-6) + 2 shared experts (intermediate=896 each)
- Parameters: 3.4B total
- License: Apache-2.0
Features
- Dynamic resolution: (0-6)x768x768 patches + 1x1024x1024 global view
- Document OCR with grounding support
- Markdown conversion from documents
Usage
Via CrispEmbed orchestrator pipeline:
from crispembed import CrispOcrOrchestrator
Source