DeepSeek-OCR-2 CrispEmbed GGUF

DeepSeek-OCR-2 (3.4B MoE) converted to GGUF for OCR with CrispEmbed.

Models

File Quant Size
deepseek-ocr2-f16.gguf F16 ~6.5 GB

Architecture

  • Vision Encoder: SAM-ViT-B (12 layers, 768d, windowed + global attention)
  • Visual Encoder: Qwen2-0.5B used bidirectionally (24 layers, 896d)
  • Projector: Linear(896, 1280)
  • LLM Decoder: DeepSeek-V2 MoE (12 layers, 1280d)
    • Layer 0: Dense SwiGLU FFN (intermediate=6848)
    • Layers 1-11: 64 routed experts (top-6) + 2 shared experts (intermediate=896 each)
  • Parameters: 3.4B total
  • License: Apache-2.0

Features

  • Dynamic resolution: (0-6)x768x768 patches + 1x1024x1024 global view
  • Document OCR with grounding support
  • Markdown conversion from documents

Usage

Via CrispEmbed orchestrator pipeline:

from crispembed import CrispOcrOrchestrator
# Configure orchestrator with deepseek_ocr2 engine

Source

Downloads last month
32
GGUF
Model size
3B params
Architecture
deepseek_ocr2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/deepseek-ocr2-crispembed-GGUF

Quantized
(5)
this model

Paper for cstr/deepseek-ocr2-crispembed-GGUF