surya-det-GGUF — Text Line Detection

GGUF conversion of the surya-ocr-2 text detector for use with CrispEmbed.

Architecture

EfficientViT-Large segformer (38M params):

Encoder: Stem + 4 stages (FusedMBConv, MBConv, EfficientVitBlock with LiteMLA linear attention)
Decoder: SegFormer-style FPN decode head (multi-scale projection + fuse + classify)
Input: RGB image resized to 1200×1200 (ImageNet-normalized)
Output: 2-channel heatmap [300×300] (text line, separator) → bounding boxes

Files

File	Type	Size	Notes
`surya-det-f32.gguf`	F32	147 MB	Full precision
`surya-det-f16.gguf`	F16	74 MB	Recommended
`surya-det-q8_0.gguf`	Q8_0	41 MB	3.6x compression
`surya-det-q4_k.gguf`	Q4_K	23 MB	6.5x compression

All BatchNorm layers are folded into preceding Conv2d weights.

Language Support

91 languages including English, German (89.7%), Chinese, Arabic, Japanese, Korean. See surya-ocr-2 for the full list.

Usage with CrispEmbed

#include "surya_det.h"

surya_det_context* ctx = surya_det_init("surya-det-f16.gguf", 4);
int hm_h, hm_w;
const float* heatmap = surya_det_detect(ctx, pixels, width, height, 3, &hm_h, &hm_w);

int n_boxes;
const surya_det_bbox* boxes = surya_det_get_boxes(ctx, width, height, 0.6f, 0.35f, &n_boxes);
for (int i = 0; i < n_boxes; i++) {
    printf("Text at (%.0f,%.0f)-(%.0f,%.0f) conf=%.2f\n",
           boxes[i].x0, boxes[i].y0, boxes[i].x1, boxes[i].y1, boxes[i].confidence);
}
surya_det_free(ctx);

Python

from crispembed import CrispTextDetect

det = CrispTextDetect("surya-det-f16.gguf")
boxes = det.detect("document.png")
for b in boxes:
    print(f"({b['x0']:.0f},{b['y0']:.0f})-({b['x1']:.0f},{b['y1']:.0f}) conf={b['confidence']:.3f}")

Parity

Verified against Python reference (identical preprocessed input):

Heatmap max: 0.9649 (exact match)
Heatmap mean: 0.0113 (exact match)
Per-stage activation means match to 4 decimal places

License

OpenRail-M — free for organizations under $5M revenue/funding. See LICENSE.

Source

Converted from datalab-to/surya-ocr-2 (surya.detection.model.encoderdecoder.EfficientViTForSemanticSegmentation).

Downloads last month: 178

GGUF

Model size

38.4M params

Architecture

surya-det

Hardware compatibility

8-bit

16-bit

32-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/surya-det-GGUF

Base model

datalab-to/surya-ocr-2

Quantized

(5)

this model