updating Readme
Browse files
README.md
CHANGED
@@ -2,7 +2,8 @@ Hugging Face's logo
|
|
2 |
---
|
3 |
language: yo
|
4 |
datasets:
|
5 |
-
-
|
|
|
6 |
---
|
7 |
# bert-base-multilingual-cased-finetuned-yoruba
|
8 |
## Model description
|
@@ -13,19 +14,15 @@ Specifically, this model is a *bert-base-multilingual-cased* model that was fine
|
|
13 |
#### How to use
|
14 |
You can use this model with Transformers *pipeline* for masked token prediction.
|
15 |
```python
|
16 |
-
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
17 |
from transformers import pipeline
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
example = "Emir of Kano turban Zhang wey don spend 18 years for Nigeria"
|
22 |
-
ner_results = nlp(example)
|
23 |
-
print(ner_results)
|
24 |
```
|
25 |
#### Limitations and bias
|
26 |
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
|
27 |
## Training data
|
28 |
-
This model was fine-tuned on on JW300
|
29 |
|
30 |
## Training procedure
|
31 |
This model was trained on a single NVIDIA V100 GPU
|
@@ -33,10 +30,9 @@ This model was trained on a single NVIDIA V100 GPU
|
|
33 |
## Eval results on Test set (F-score)
|
34 |
Dataset|F1-score
|
35 |
-|-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
BBC Yoruba |91.75
|
40 |
|
41 |
### BibTeX entry and citation info
|
42 |
By David Adelani
|
2 |
---
|
3 |
language: yo
|
4 |
datasets:
|
5 |
+
- [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt)
|
6 |
+
- [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3)
|
7 |
---
|
8 |
# bert-base-multilingual-cased-finetuned-yoruba
|
9 |
## Model description
|
14 |
#### How to use
|
15 |
You can use this model with Transformers *pipeline* for masked token prediction.
|
16 |
```python
|
|
|
17 |
from transformers import pipeline
|
18 |
+
>>> from transformers import pipeline
|
19 |
+
>>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-yoruba')
|
20 |
+
>>> unmasker("Arẹmọ Phillip to jẹ ọkọ [MASK] Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun")
|
|
|
|
|
|
|
21 |
```
|
22 |
#### Limitations and bias
|
23 |
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
|
24 |
## Training data
|
25 |
+
This model was fine-tuned on on Bible, JW300, [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt), [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3) and [CC-Aligned](https://opus.nlpl.eu/), Wikipedia, news corpora (BBC Yoruba, VON Yoruba, Asejere, Alaroye), and other small datasets curated from friends.
|
26 |
|
27 |
## Training procedure
|
28 |
This model was trained on a single NVIDIA V100 GPU
|
30 |
## Eval results on Test set (F-score)
|
31 |
Dataset|F1-score
|
32 |
-|-
|
33 |
+
Yoruba GV NER |75.34
|
34 |
+
MasakhaNER |80.82
|
35 |
+
BBC Yoruba |80.66
|
|
|
36 |
|
37 |
### BibTeX entry and citation info
|
38 |
By David Adelani
|