legaldocuman-rfdetr
RF-DETR-Base fine-tuned for handwritten signature detection in legal documents.
This model is the computer vision backbone of LegalDocuMan — a commercial document intelligence API for contract processing.
Model Details
| Field | Value |
|---|---|
| Base architecture | RF-DETR-Base |
| Parameters | 31.9M |
| Base checkpoint | Roboflow RF-DETR-Base (COCO pretrained) |
| Task | Single-class object detection |
| Class | signature |
| Input resolution | 560px (square) |
| License | Apache 2.0 |
Training Data
Three open-source datasets were merged, deduplicated, and cleaned before training.
Cross-split contamination was verified via perceptual hashing — zero overlap between train, val, and test confirmed.
| Dataset | Images | License |
|---|---|---|
| signatures-xc8up (Roboflow 100) | ~2,800 | CC BY 4.0 |
| Signature Detector (TrueSign) | 772 | MIT |
| Signature Detection v3 (home) | 2,145 | CC BY 4.0 |
Final dataset size after cleaning: 3,996 images
Training Configuration
| Parameter | Value |
|---|---|
| Augmentation preset | Document (9 transforms) |
| Augmentations | Perspective distortion, grid distortion, JPEG compression artifacts, Gaussian blur, Gaussian noise, random brightness/contrast, CLAHE, sharpening, horizontal flip |
| Input resolution | 560px |
| Early stopping patience | 20 epochs |
| Stopped at | Epoch 16 |
| Best val mAP@50 | 0.9123 (epoch 11) |
| Framework | rfdetr · PyTorch · PyTorch Lightning |
| Hardware | NVIDIA RTX 4070 Ti Super 16GB |
Evaluation
Evaluated on 168 held-out test images with zero contamination from training or validation sets.
Overall
| Metric | Score |
|---|---|
| mAP@50 | 0.7924 |
| mAP@50:95 | 0.820 |
| Precision@0.50 | 0.9490 |
| Recall@0.50 | 0.8020 |
| F1@0.50 | 0.8696 |
By Signature Size
| Size | AP | AR |
|---|---|---|
| Small | 0.252 | 0.500 |
| Medium | 0.834 | 0.855 |
| Large | 0.828 | 0.845 |
Note on small signatures: Signatures smaller than ~32×32px after resizing have significantly lower recall.
For documents with initials or compact date-field signatures, human review is recommended.
Usage
from rfdetr import RFDETRBase
from PIL import Image
model = RFDETRBase(pretrain_weights="Mo-Awadalla/legaldocuman-rfdetr")
image = Image.open("contract_page.jpg")
detections = model.predict(image, threshold=0.45)
print(detections)
For PDF inputs, convert pages to images first using pdf2image:
from pdf2image import convert_from_path
pages = convert_from_path("contract.pdf", dpi=200)
for i, page in enumerate(pages):
detections = model.predict(page, threshold=0.45)
print(f"Page {i+1}: {detections}")
Intended Use
Designed for:
- Detecting the presence and location of handwritten ink signatures in legal contract pages
- Document intake pipelines processing PDF, DOCX, and scanned image inputs
- Execution status classification (executed vs. draft) as part of a broader document intelligence pipeline
Out of scope:
- Signature verification or authenticity determination
- Forgery detection
- Digital or electronic signature detection
- Handwriting recognition or transcription
Limitations
- Small signatures (initials, compact date-field signatures) have significantly lower recall (AP 0.252)
- Performance may degrade on scans below 150 DPI
- Not trained on non-Latin document layouts
- Should not be used as the sole decision-maker for high-stakes legal determinations without human review in the loop
Part of LegalDocuMan
This model is one component of the LegalDocuMan pipeline:
Upload PDF/DOCX/Image
↓
Text extraction (pdfplumber · python-docx · Tesseract OCR)
↓
Document type classification (MSA · SOW · NDA · PO · Amendment · License · Contract)
↓
Execution status (Regex NLP + RF-DETR visual signature detection)
↓
Vendor · Dates · Retention extraction
↓
Structured filename + PostgreSQL persistence
→ GitHub: Mo-Awadalla/LegalDocuMan
Attribution
Training data includes datasets licensed under CC BY 4.0.
Per license requirements:
- signatures-xc8up by Roboflow 100 — Roboflow Universe
- Signature Detection v3 by home — Roboflow Universe