Text Readability Grade Predictor

This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level.

Model Details

  • Model Type: ModernBERT fine-tuned for regression
  • Language: English
  • Task: Text Readability Assessment (Regression)
  • Framework: PyTorch
  • Base Model: answerdotai/ModernBERT-base
  • Training Data: CLEAR dataset
  • Performance:
    • RMSE: 1.4143198236928092
    • R²: 0.8125544567620288
  • Output: Predicted grade level (0-12)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor")
tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor")

# Prepare text
text = "Your text goes here."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Get prediction (ensure it's between 0 and 12)
pred_grade = outputs.logits.item()
pred_grade = max(0, min(pred_grade, 12.0))
print(f"Predicted grade level: {pred_grade:.1f}")

Reading Level Categories

The predicted grade levels correspond to these educational categories:

  • < 1.0: Pre-Kindergarten
  • 1.0 - 2.9: Early Elementary
  • 3.0 - 5.9: Elementary
  • 6.0 - 8.9: Middle School
  • 9.0 - 11.9: High School
  • 12.0+: College Level

Example Predictions

Example: Early Elementary

The cat sat on the mat. It was happy. The sun was shining.

Predicted Grade Level: 1.2

Example: Middle School

The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ...

Predicted Grade Level: 8.9

Example: High School

The quantum mechanical model of atomic structure provides a theoretical framework for understanding ...

Predicted Grade Level: 11.6

Limitations

  • The model is trained on English text only
  • Performance may vary for specialized or technical content
  • Very short texts (fewer than 10 words) may not yield accurate predictions
  • The model is calibrated for US educational grade levels

Training

This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved:

  1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels
  2. Augmenting the dataset through text chunking
  3. Averaging grade level metrics for a more reliable target
  4. Fine-tuning ModernBERT with a regression head
  5. Optimizing for minimum RMSE and maximum R²
Downloads last month
10
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for kiddom/modernbert-readability-grade-predictor

Finetuned
(448)
this model