ZFW Captcha Recognition Model

针对正方教务系统自服务平台的 4 位纯数字验证码识别模型。纯 CNN 架构，无需 RNN/CTC，轻量高效。

Model Variants

文件	变体	参数量	文件大小	验证集准确率	推荐场景
`small/final_model.pth`	small	~96K	390 KB	99.96%	通用部署（推荐）
`full/final_model.pth`	full	~196K	780 KB	99.97%	追求极致精度
`nano/final_model.pth`	nano	~21K	94 KB	95.49%	极致压缩 / 嵌入式
`distill-nano/final_model.pth`	nano (distilled)	~21K	94 KB	—	蒸馏实验产物

**推荐选择 small**：390KB 即可达到 99.96% 准确率，性价比最高。

Task Description

验证码类型：4 位纯数字（0-9），固定长度
来源平台：正方教务系统（ZFW）自服务平台
干扰形式：旋转、噪点、干扰线
输入尺寸：90 × 34 像素，RGB

Samples

样本	标签
	`9800`
	`9350`

Architecture

Input (3, 34, 90)
    → [Conv3×3 + BN + ReLU + MaxPool] × 3   (空间降采样)
    → [Conv3×3 + BN + ReLU] × N             (特征提取)
    → AdaptiveAvgPool2d(1, 4)                (压缩为 4 列，对应 4 个数字位置)
    → 4 × Linear(C, 10)                     (每个位置独立 10 分类)
Output: (B, 4, 10) logits

设计理由：验证码为固定 4 位、位置固定的纯数字，不存在变长对齐问题，因此使用空间池化 + 多头分类代替 RNN/CTC，简单高效。

Quick Start

import torch
from torchvision import transforms
from PIL import Image

# 1. Define model (copy from src/model.py or install the package)
from model import build_model

# 2. Load
model = build_model('small')
model.load_state_dict(torch.load('small/final_model.pth', map_location='cpu'))
model.eval()

# 3. Preprocess
transform = transforms.Compose([
    transforms.Resize((34, 90)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

img = Image.open('captcha.png').convert('RGB')
x = transform(img).unsqueeze(0)  # (1, 3, 34, 90)

# 4. Predict
with torch.no_grad():
    logits = model(x)              # (1, 4, 10)
    digits = logits.argmax(dim=2)  # (1, 4)
    result = ''.join(str(d.item()) for d in digits[0])

print(result)  # e.g. "3807"

Training

框架：PyTorch
损失函数：CrossEntropyLoss × 4（每位数字独立）
优化器：Adam (lr=0.001, fused)
学习率调度：StepLR (step=10, gamma=0.5)
早停：patience=8
数据增强：无（仅 Normalize）
训练监控：SwanLab

Training Curves

完整训练过程（loss、accuracy、learning rate 曲线）请查看：

SwanLab Dashboard

Source Code

训练代码开源：GitHub - zfw_captcha_train

Limitations

仅支持正方教务系统特定样式的验证码
仅识别 4 位纯数字（0-9），不支持字母或其他字符
输入图片应为 90×34 或等比例尺寸

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Whole-image Accuracy (small)
self-reported

99.960
Whole-image Accuracy (full)
self-reported

99.970
Whole-image Accuracy (nano)
self-reported

95.490