EvalGuide IELTS AES v2.5

DeBERTa-v3-base ordinal regression model for IELTS Writing Task 2 scoring across four criteria:

Production checkpoint (current)

Field	Value
Variant	Augmented + calibrated
Weights	`ielts_v2.5_base_en_10ep.weights.h5`
Calibration	`ielts_v2.5_base_en_10ep_calibration.pkl`
Backbone	`deberta_v3_base_en`
Input format	Essay body only (`full_text`) — no question prefix
Gold harness QWK	0.7989 calibrated / 0.8505 raw (1,952-essay holdout)

Calibrated serving — Isotonic calibration plus bias correction improves mean-score alignment (SMD −0.07 vs v2.4 +0.08) and lowers RMSE, which matters more for production UX than the higher raw QWK ablation.
Augmented training — Synonym augmentation (10% of train essays) is part of the documented v2.5 strategy and was verified active in the final run. The no-aug ablation checkpoint is preserved in repo history (first commit).

File	Description
`ielts_v2.5_base_en_10ep.weights.h5`	Model weights (~3.5 GB)
`ielts_v2.5_base_en_10ep_calibration.pkl`	Isotonic calibration layer
`ielts_v2.5_base_en_10ep_config.json`	Training metadata and metrics
`model_config.json`	Production serving config for EvalGuide backend

hf download koecheup/evalguide-ielts-v2.5 --local-dir backend/model

Place artifacts under evalguide_client/backend/model/ alongside model_config.json.

Tokenize essay content only. Do not prepend Question: … — training and offline eval use essay-only input.
Apply the calibration artifact after forward pass when serving the production config.
Rollback to v2.4: set IELTS_MODEL_NAME=ielts_v2.4_base_en_10ep.weights.h5.

See docs/backend/v2.5_upgrade_report.md in the EvalGuide repo for full evaluation tables.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support