Finnish Text Difficulty Assessor

Fine-tuned TurkuNLP/bert-base-finnish-cased-v1 for ordinal classification of Finnish text difficulty on an 11-point CEFR-aligned scale.

Model details

Base model TurkuNLP/bert-base-finnish-cased-v1
Task Single-label ordinal classification
Labels 10 ordinal difficulty levels
Loss KL-divergence with Gaussian soft labels (SORD)
Augmentation Back-translation (Estonian ↔ Finnish) + paraphrasing
Max length 512 tokens

Usage

from transformers import BertForSequenceClassification, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained("chiunhau/finnish-difficulty-assessor")
model     = BertForSequenceClassification.from_pretrained("chiunhau/finnish-difficulty-assessor")
model.eval()

text   = "Hän käy koulussa joka päivä."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
pred_idx   = logits.argmax().item()
pred_label = model.config.id2label[pred_idx]
print(f"Difficulty level: {pred_label}")   # numeric CEFR value

Label mapping

Labels are numeric representations of CEFR levels (A1 → C2).

Index Difficulty value
0 1.0
1 1.5
2 2.0
3 2.5
4 3.0
5 3.5
6 4.0
7 5.0
8 5.5
9 6.0
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support