File size: 1,415 Bytes
df8bdb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49d1899
931394d
49d1899
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df8bdb4
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
language: wo
tags:
- bert
- language-model
- wo
- wolof 
---

# Soraberta: Unsupervised Language Model Pre-training for Wolof

**bert-base-wolof** is pretrained bert-base model on wolof language  .

## Soraberta models

| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
| :------:       |   :---: | :---: | :---: | :---: |
| `bert-base` | 6    | 12   | 514   | 56931622 M |
 



## Using Soraberta with Hugging Face's Transformers


```python
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/bert-base-wolof')
>>> unmasker("kuy yoot du [MASK].")

[{'sequence': '[CLS] kuy yoot du seqet. [SEP]',
  'score': 0.09505125880241394,
  'token': 13578},
 {'sequence': '[CLS] kuy yoot du daw. [SEP]',
  'score': 0.08882280439138412,
  'token': 679},
 {'sequence': '[CLS] kuy yoot du yoot. [SEP]',
  'score': 0.057790059596300125,
  'token': 5117},
 {'sequence': '[CLS] kuy yoot du seqat. [SEP]',
  'score': 0.05671025067567825,
  'token': 4992},
 {'sequence': '[CLS] kuy yoot du yaqu. [SEP]',
  'score': 0.0469999685883522,
  'token': 1735}]
```

## Training data
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/) 
[ALFFA_PUBLIC](https://github.com/getalp/ALFFA_PUBLIC/tree/master/ASR/WOLOF)



## Contact

Please contact abdouaziz@gmail.com for any question, feedback or request.