swahBERT

This model was fine tuned using the dataset listed below. It achieves the following results on the evaluation set:

Loss: 0.4982
Accuracy: 0.9292
Precision: 0.9550
Recall: 0.9010
F1: 0.9272

Model description

This is a fine tuned swahBERT model. You can get the original model from here

Training and evaluation data

The model was fine tuned using this dataset

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	1.0	310	0.6506	0.9282	0.9417	0.9131	0.9272
0.0189	2.0	620	0.4982	0.9292	0.9550	0.9010	0.9272
0.0189	3.0	930	0.5387	0.9323	0.9693	0.8929	0.9295
0.0314	4.0	1240	0.6365	0.9221	0.9524	0.8889	0.9195
0.0106	5.0	1550	0.6687	0.9282	0.9473	0.9071	0.9267
0.0106	6.0	1860	0.6671	0.9282	0.9454	0.9091	0.9269
0.0016	7.0	2170	0.6908	0.9242	0.9468	0.8990	0.9223
0.0016	8.0	2480	0.6832	0.9272	0.9471	0.9051	0.9256

Framework versions

Transformers 4.33.1
Pytorch 2.0.1+cpu
Datasets 2.14.5
Tokenizers 0.13.3

References

@inproceedings{martin-etal-2022-swahbert, title = "{S}wah{BERT}: Language Model of {S}wahili", author = "Martin, Gati and Mswahili, Medard Edmund and Jeong, Young-Seob and Woo, Jiyoung", booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.naacl-main.23", pages = "303--313" }

metabloit
/

swahBERT