--- language: wo tags: - roberta - language-model - wo - wolof --- # Soraberta: Unsupervised Language Model Pre-training for Wolof **Soraberta** is pretrained roberta-base model on wolof language . Roberta was introduced in [this paper](https://arxiv.org/abs/1907.11692) ## Soraberta models | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters | | :------: | :---: | :---: | :---: | :---: | | `soraberta-base` | 6 | 12 | 514 | 83 M | ## Using Soraberta with Hugging Face's Transformers ```python >>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta') >>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla .") [{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.', 'score': 0.9783930778503418, 'token': 4621, 'token_str': ' gileem'}, {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.', 'score': 0.009271537885069847, 'token': 2155, 'token_str': ' jend'}, {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.', 'score': 0.0027585660573095083, 'token': 704, 'token_str': ' aw'}, {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.', 'score': 0.001120452769100666, 'token': 1171, 'token_str': ' pel'}, {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.', 'score': 0.0005133090307936072, 'token': 5820, 'token_str': ' juum'}] ``` ## Training data The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/) ## Contact Please contact abdouaziz@gmail.com for any question, feedback or request.