Edit model card

Scandinavian Education Classifier NB-BERT

Trained using code from [ComsmoPedia[https://github.com/huggingface/cosmopedia/tree/main/classification], but with the nb-bert-base as starting point. The data used in classification is from GlotCC and have been annotated using Gemini 1.5 Flash.

The following command where used for training:

 python train_edu_bert.py --base_model_name="NbAiLab/nb-bert-base" --dataset_name="north/scandinavian-educational-annotations" --target_column="score" --checkpoint_dir="/home/pere/checkpoints/scandinavian_bert/"

Classification Report

Class Precision Recall F1-Score Support
0 0.78 0.70 0.74 18274
1 0.67 0.75 0.71 23348
2 0.49 0.47 0.48 6621
3 0.47 0.26 0.33 1314
4 0.60 0.07 0.12 433
5 0.00 0.00 0.00 10
Metric Value
Accuracy 0.68
Macro Avg
- Precision 0.50
- Recall 0.38
- F1-Score 0.40
Weighted Avg
- Precision 0.68
- Recall 0.68
- F1-Score 0.67
Total Support 50000

Confusion Matrix

Class 0 Class 1 Class 2 Class 3 Class 4 Class 5
Class 0 12873 5327 74 0 0 0
Class 1 3486 17582 2238 41 1 0
Class 2 75 3244 3105 197 0 0
Class 3 5 206 746 338 19 0
Class 4 0 45 217 140 30 1
Class 5 0 1 8 1 0 0

Evaluation Metrics

Metric Value
Eval Loss 0.2926119863986969
Eval Precision 0.5010686403845288
Eval Recall 0.37549345115259253
Eval F1 Macro 0.39714660593426115
Eval Accuracy 0.67856
Eval Runtime 86.0674
Eval Samples Per Second 580.94
Eval Steps Per Second 4.543
Epoch 19.91

Training Metrics

Metric Value
Loss 0.2803
Grad Norm 0.5055287480354309
Learning Rate 5.119453924914675e-07
Epoch 19.97

Training Runtime

Metric Value
Train Runtime 19555.3448
Train Samples Per Second 460.232
Train Steps Per Second 1.798
Train Loss 0.29856721191276053
Epoch 20.0
Downloads last month
3
Safetensors
Model size
178M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.