README.md · orai-nlp/bert-base-sw at 9c126149e8bf092ef29fe2f580fa1c5b9249fd44

metadata

license: cc-by-4.0
language:
  - sw

BERT base (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work Scaling Laws for BERT in Low-Resource Settings at ACL2023 Findings.

The model has 124M parameters (12L), and a vocab size of 50K. It was trained for 500K steps with a sequence length of 512 tokens.

Results

	bert-base-sw	bert-medium-sw	Flair	mBERT	swahBERT (Martin et al., 2022b)
NERC	92.09	91.63	92.04	91.17	88.60
Topic	93.07	92.88	91.83	91.52	90.90
Sentiment	79.04	77.07	73.60	69.17	71.12
QNLI	63.34	63.87	52.82	63.48	64.72

Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1], Rodrigo Agerri [2] and Aitor Soroa [2]

Affiliation of the authors:

[1] Orai NLP Technologies

[2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU

The model is licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).

If you use this model please cite the following paper:

G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada

Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus