Instructions to use phonsobon/mini-text-detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use phonsobon/mini-text-detection with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("phonsobon/mini-text-detection") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
mini-text-detection β Khmer & English Text Detection
A YOLO11n-based text detection model fine-tuned to locate and classify text regions in images containing Khmer and English content.
It detects 3 types of text blocks and can be used as the first stage before passing crops to an OCR model (e.g. phonsobon/mini-ocr).
Model Details
| Property | Value |
|---|---|
| Architecture | YOLO11n (nano) |
| Task | Object Detection β 3 classes |
| Weights file | khmer-text-detection-mini.pt |
| Framework | Ultralytics / PyTorch |
| Input | RGB image, any size (auto-resized internally) |
Classes
| ID | Name | Khmer | Description |
|---|---|---|---|
0 |
subject |
ααααααααα» | Title or subject heading |
1 |
reference |
ααα | Reference or citation |
2 |
content |
α’ααααα | Main body / paragraph text |
Files
| File | Description |
|---|---|
khmer-text-detection-mini.pt |
Full Ultralytics YOLO model (weights + config) |
Quick Start
Install dependencies
pip install ultralytics huggingface_hub
Run inference
from ultralytics import YOLO
from huggingface_hub import hf_hub_download
# ββ Download model ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_path = hf_hub_download(
repo_id="phonsobon/mini-text-detection",
filename="khmer-text-detection-mini.pt",
)
# ββ Class names βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
# ββ Load & predict ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model = YOLO(model_path)
results = model.predict(
source="your_image.jpg", # path, URL, or numpy array
conf=0.25, # confidence threshold
iou=0.45, # NMS IoU threshold
imgsz=640,
)
# ββ Print results βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
for r in results:
r.show() # display with bounding boxes
for box in r.boxes:
cls_id = int(box.cls)
label = CLASS_NAMES[cls_id]
conf = float(box.conf)
x1, y1, x2, y2 = box.xyxy[0].tolist()
print(f"[{label}] conf={conf:.2f} box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
Filter by class
# Get only subject (heading) boxes
subject_boxes = [b for b in results[0].boxes if int(b.cls) == 0]
# Get only content (body) boxes
content_boxes = [b for b in results[0].boxes if int(b.cls) == 2]
Save annotated images
results = model.predict(source="your_image.jpg", save=True, project="runs/detect")
# Saved to runs/detect/predict/
Batch inference on a folder
results = model.predict(source="path/to/images/", conf=0.25, imgsz=640)
for r in results:
counts = {name: 0 for name in CLASS_NAMES.values()}
for box in r.boxes:
counts[CLASS_NAMES[int(box.cls)]] += 1
print(r.path, "β", counts)
Crop + OCR Pipeline
Combine this model with phonsobon/mini-ocr for full end-to-end document reading, with each region labelled by type:
from ultralytics import YOLO
from huggingface_hub import hf_hub_download
from PIL import Image
CLASS_NAMES = {0: "subject", 1: "reference", 2: "content"}
# ββ Load detection model ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
det_path = hf_hub_download("phonsobon/mini-text-detection", "khmer-text-detection-mini.pt")
detector = YOLO(det_path)
# ββ Detect text regions βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
image_path = "your_image.jpg"
results = detector.predict(source=image_path, conf=0.25, imgsz=640)
img = Image.open(image_path).convert("RGB")
# ββ Crop each region sorted by class βββββββββββββββββββββββββββββββββββββββββ
for i, box in enumerate(results[0].boxes):
cls_id = int(box.cls)
label = CLASS_NAMES[cls_id]
x1,y1,x2,y2 = map(int, box.xyxy[0].tolist())
crop = img.crop((x1, y1, x2, y2))
crop.save(f"crop_{i}_{label}.png")
print(f"Saved crop {i} β class: {label}")
# β feed each crop to phonsobon/mini-ocr for text recognition
Input Tips
- Works on any image size β YOLO resizes internally to 640 px by default.
- Best results on document photos, screenshots, and scanned pages.
- Adjust
conf(0.1 β 0.5) to trade recall vs. precision depending on your use case.
Limitations
- May miss very small text (< ~8 px height in the original image).
- Not designed for handwritten or heavily stylised/artistic fonts.
- Performance is best on document-style layouts similar to training data.
Related Model
| Model | Task |
|---|---|
| phonsobon/mini-ocr | Text recognition (CRNN + CTC) for Khmer & English |
License
MIT
- Downloads last month
- 3