Davlan commited on
Commit
8c6d20a
1 Parent(s): d335cc2

updating Readme

Browse files
Files changed (1) hide show
  1. README.md +9 -13
README.md CHANGED
@@ -2,7 +2,8 @@ Hugging Face's logo
2
  ---
3
  language: yo
4
  datasets:
5
- - Bible, JW300, [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt), [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3) and [CC-Aligned](https://opus.nlpl.eu/), Wikipedia, news corpora (BBC Yoruba, VON Yoruba, Asejere, Alaroye), and other small datasets curated from friends.
 
6
  ---
7
  # bert-base-multilingual-cased-finetuned-yoruba
8
  ## Model description
@@ -13,19 +14,15 @@ Specifically, this model is a *bert-base-multilingual-cased* model that was fine
13
  #### How to use
14
  You can use this model with Transformers *pipeline* for masked token prediction.
15
  ```python
16
- from transformers import AutoTokenizer, AutoModelForTokenClassification
17
  from transformers import pipeline
18
- tokenizer = AutoTokenizer.from_pretrained("")
19
- model = AutoModelForTokenClassification.from_pretrained("")
20
- nlp = pipeline("", model=model, tokenizer=tokenizer)
21
- example = "Emir of Kano turban Zhang wey don spend 18 years for Nigeria"
22
- ner_results = nlp(example)
23
- print(ner_results)
24
  ```
25
  #### Limitations and bias
26
  This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
27
  ## Training data
28
- This model was fine-tuned on on JW300 Yorùbá corpus and [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt) dataset
29
 
30
  ## Training procedure
31
  This model was trained on a single NVIDIA V100 GPU
@@ -33,10 +30,9 @@ This model was trained on a single NVIDIA V100 GPU
33
  ## Eval results on Test set (F-score)
34
  Dataset|F1-score
35
  -|-
36
-
37
- Yoruba GV NER |86.26
38
- MasakhaNER |75.76
39
- BBC Yoruba |91.75
40
 
41
  ### BibTeX entry and citation info
42
  By David Adelani
2
  ---
3
  language: yo
4
  datasets:
5
+ - [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt)
6
+ - [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3)
7
  ---
8
  # bert-base-multilingual-cased-finetuned-yoruba
9
  ## Model description
14
  #### How to use
15
  You can use this model with Transformers *pipeline* for masked token prediction.
16
  ```python
 
17
  from transformers import pipeline
18
+ >>> from transformers import pipeline
19
+ >>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-yoruba')
20
+ >>> unmasker("Arẹmọ Phillip to jẹ ọkọ [MASK] Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun")
 
 
 
21
  ```
22
  #### Limitations and bias
23
  This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
24
  ## Training data
25
+ This model was fine-tuned on on Bible, JW300, [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt), [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3) and [CC-Aligned](https://opus.nlpl.eu/), Wikipedia, news corpora (BBC Yoruba, VON Yoruba, Asejere, Alaroye), and other small datasets curated from friends.
26
 
27
  ## Training procedure
28
  This model was trained on a single NVIDIA V100 GPU
30
  ## Eval results on Test set (F-score)
31
  Dataset|F1-score
32
  -|-
33
+ Yoruba GV NER |75.34
34
+ MasakhaNER |80.82
35
+ BBC Yoruba |80.66
 
36
 
37
  ### BibTeX entry and citation info
38
  By David Adelani