bert-base-sw / README.md
GorkaUrbizu's picture
Update README.md
9c12614
|
raw
history blame
2.08 kB
metadata
license: cc-by-4.0
language:
  - sw

BERT base (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work Scaling Laws for BERT in Low-Resource Settings at ACL2023 Findings.

The model has 124M parameters (12L), and a vocab size of 50K. It was trained for 500K steps with a sequence length of 512 tokens.

Results

bert-base-sw bert-medium-sw Flair mBERT swahBERT (Martin et al., 2022b)
NERC 92.09 91.63 92.04 91.17 88.60
Topic 93.07 92.88 91.83 91.52 90.90
Sentiment 79.04 77.07 73.60 69.17 71.12
QNLI 63.34 63.87 52.82 63.48 64.72

Authors

Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1], Rodrigo Agerri [2] and Aitor Soroa [2]

Affiliation of the authors:

[1] Orai NLP Technologies

[2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU

Licensing

The model is licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).

To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Acknowledgements

If you use this model please cite the following paper:

  • G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada

Contact information

Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus