EvalGuide IELTS AES v2.5
DeBERTa-v3-base ordinal regression model for IELTS Writing Task 2 scoring across four criteria:
- Task Response
- Coherence and Cohesion
- Lexical Resource
- Grammatical Range and Accuracy
Production checkpoint (current)
| Field | Value |
|---|---|
| Variant | Augmented + calibrated |
| Weights | ielts_v2.5_base_en_10ep.weights.h5 |
| Calibration | ielts_v2.5_base_en_10ep_calibration.pkl |
| Backbone | deberta_v3_base_en |
| Input format | Essay body only (full_text) โ no question prefix |
| Gold harness QWK | 0.7989 calibrated / 0.8505 raw (1,952-essay holdout) |
Why this checkpoint is served
- Calibrated serving โ Isotonic calibration plus bias correction improves mean-score alignment (SMD โ0.07 vs v2.4 +0.08) and lowers RMSE, which matters more for production UX than the higher raw QWK ablation.
- Augmented training โ Synonym augmentation (10% of train essays) is part of the documented v2.5 strategy and was verified active in the final run. The no-aug ablation checkpoint is preserved in repo history (first commit).
Files
| File | Description |
|---|---|
ielts_v2.5_base_en_10ep.weights.h5 |
Model weights (~3.5 GB) |
ielts_v2.5_base_en_10ep_calibration.pkl |
Isotonic calibration layer |
ielts_v2.5_base_en_10ep_config.json |
Training metadata and metrics |
model_config.json |
Production serving config for EvalGuide backend |
Download
hf download koecheup/evalguide-ielts-v2.5 --local-dir backend/model
Place artifacts under evalguide_client/backend/model/ alongside model_config.json.
Inference notes
- Tokenize essay content only. Do not prepend
Question: โฆโ training and offline eval use essay-only input. - Apply the calibration artifact after forward pass when serving the production config.
- Rollback to v2.4: set
IELTS_MODEL_NAME=ielts_v2.4_base_en_10ep.weights.h5.
Training summary
- Real data: 9,760 cleaned essays (
ielts_cleaned.csv) - Synthetic mix: 15% from 284 cleaned Task 2 essays (
koecheup/ielts-synthetic) - Augmentation: 10% synonym replacement (780 train essays)
- Epochs: 10, batch size 8, variance target 2.0 โ 2.7
See docs/backend/v2.5_upgrade_report.md in the EvalGuide repo for full evaluation tables.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support