legaldocuman-rfdetr

RF-DETR-Base fine-tuned for handwritten signature detection in legal documents.
This model is the computer vision backbone of LegalDocuMan — a commercial document intelligence API for contract processing.

Model Details

Field	Value
Base architecture	RF-DETR-Base
Parameters	31.9M
Base checkpoint	Roboflow RF-DETR-Base (COCO pretrained)
Task	Single-class object detection
Class	`signature`
Input resolution	560px (square)
License	Apache 2.0

Training Data

Three open-source datasets were merged, deduplicated, and cleaned before training.
Cross-split contamination was verified via perceptual hashing — zero overlap between train, val, and test confirmed.

Dataset	Images	License
signatures-xc8up (Roboflow 100)	~2,800	CC BY 4.0
Signature Detector (TrueSign)	772	MIT
Signature Detection v3 (home)	2,145	CC BY 4.0

Final dataset size after cleaning: 3,996 images

Training Configuration

Parameter	Value
Augmentation preset	Document (9 transforms)
Augmentations	Perspective distortion, grid distortion, JPEG compression artifacts, Gaussian blur, Gaussian noise, random brightness/contrast, CLAHE, sharpening, horizontal flip
Input resolution	560px
Early stopping patience	20 epochs
Stopped at	Epoch 16
Best val mAP@50	0.9123 (epoch 11)
Framework	rfdetr · PyTorch · PyTorch Lightning
Hardware	NVIDIA RTX 4070 Ti Super 16GB

Evaluation

Evaluated on 168 held-out test images with zero contamination from training or validation sets.

Overall

Metric	Score
mAP@50	0.7924
mAP@50:95	0.820
Precision@0.50	0.9490
Recall@0.50	0.8020
F1@0.50	0.8696

By Signature Size

Size	AP	AR
Small	0.252	0.500
Medium	0.834	0.855
Large	0.828	0.845

Note on small signatures: Signatures smaller than ~32×32px after resizing have significantly lower recall.
For documents with initials or compact date-field signatures, human review is recommended.

Usage

from rfdetr import RFDETRBase
from PIL import Image

model = RFDETRBase(pretrain_weights="Mo-Awadalla/legaldocuman-rfdetr")

image = Image.open("contract_page.jpg")
detections = model.predict(image, threshold=0.45)
print(detections)

For PDF inputs, convert pages to images first using pdf2image:

from pdf2image import convert_from_path

pages = convert_from_path("contract.pdf", dpi=200)
for i, page in enumerate(pages):
    detections = model.predict(page, threshold=0.45)
    print(f"Page {i+1}: {detections}")

Intended Use

Designed for:

Detecting the presence and location of handwritten ink signatures in legal contract pages
Document intake pipelines processing PDF, DOCX, and scanned image inputs
Execution status classification (executed vs. draft) as part of a broader document intelligence pipeline

Out of scope:

Signature verification or authenticity determination
Forgery detection
Digital or electronic signature detection
Handwriting recognition or transcription

Limitations

Small signatures (initials, compact date-field signatures) have significantly lower recall (AP 0.252)
Performance may degrade on scans below 150 DPI
Not trained on non-Latin document layouts
Should not be used as the sole decision-maker for high-stakes legal determinations without human review in the loop

Part of LegalDocuMan

This model is one component of the LegalDocuMan pipeline:

Upload PDF/DOCX/Image
        ↓
Text extraction (pdfplumber · python-docx · Tesseract OCR)
        ↓
Document type classification (MSA · SOW · NDA · PO · Amendment · License · Contract)
        ↓
Execution status (Regex NLP + RF-DETR visual signature detection)
        ↓
Vendor · Dates · Retention extraction
        ↓
Structured filename + PostgreSQL persistence

→ GitHub: Mo-Awadalla/LegalDocuMan

Attribution

Training data includes datasets licensed under CC BY 4.0.
Per license requirements:

signatures-xc8up by Roboflow 100 — Roboflow Universe
Signature Detection v3 by home — Roboflow Universe

Downloads last month: -; Downloads are not tracked for this model. How to track