Edit model card

bert-mini-amharic-16k

This model has the same architecture as bert-mini and was pretrained from scratch using the Amharic subsets of the oscar and mc4 datasets, on a total of 165 Million tokens. It achieves the following results on the evaluation set:

  • Loss: 2.59
  • Perplexity: 13.33

Even though this model only has 7.5 Million parameters, its perplexity score is comparable to the 36x larger 279 Million parameter xlm-roberta-base model on the same Amharic evaluation set.

Downloads last month
5
Safetensors
Model size
7.57M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train rasyosef/bert-mini-amharic-16k