|
--- |
|
language: wo |
|
tags: |
|
- bert |
|
- language-model |
|
- wo |
|
- wolof |
|
--- |
|
|
|
# Soraberta: Unsupervised Language Model Pre-training for Wolof |
|
|
|
**bert-base-wolof** is pretrained bert-base model on wolof language . |
|
|
|
## Soraberta models |
|
|
|
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters | |
|
| :------: | :---: | :---: | :---: | :---: | |
|
| `bert-base` | 6 | 12 | 514 | 56931622 M | |
|
|
|
|
|
|
|
|
|
## Using Soraberta with Hugging Face's Transformers |
|
|
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta') |
|
>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.") |
|
|
|
[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.', |
|
'score': 0.9783930778503418, |
|
'token': 4621, |
|
'token_str': ' gileem'}, |
|
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.', |
|
'score': 0.009271537885069847, |
|
'token': 2155, |
|
'token_str': ' jend'}, |
|
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.', |
|
'score': 0.0027585660573095083, |
|
'token': 704, |
|
'token_str': ' aw'}, |
|
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.', |
|
'score': 0.001120452769100666, |
|
'token': 1171, |
|
'token_str': ' pel'}, |
|
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.', |
|
'score': 0.0005133090307936072, |
|
'token': 5820, |
|
'token_str': ' juum'}] |
|
``` |
|
|
|
## Training data |
|
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/) |
|
[ALFFA_PUBLIC](https://github.com/getalp/ALFFA_PUBLIC/tree/master/ASR/WOLOF) |
|
|
|
|
|
|
|
## Contact |
|
|
|
Please contact abdouaziz@gmail.com for any question, feedback or request. |