Davlan commited on
Commit
92d1114
1 Parent(s): 9103c00

updating hausa readme

Browse files
Files changed (1) hide show
  1. README.md +27 -25
README.md CHANGED
@@ -1,43 +1,45 @@
1
  Hugging Face's logo
2
  ---
3
- language: yo
4
  datasets:
5
 
6
  ---
7
- # bert-base-multilingual-cased-finetuned-yoruba
8
  ## Model description
9
- **bert-base-multilingual-cased-finetuned-yoruba** is a **Yoruba BERT** model obtained by fine-tuning **bert-base-multilingual-cased** model on Yorùbá language texts. It provides **better performance** than the multilingual BERT on text classification and named entity recognition datasets.
10
 
11
- Specifically, this model is a *bert-base-multilingual-cased* model that was fine-tuned on Yorùbá corpus.
12
  ## Intended uses & limitations
13
  #### How to use
14
  You can use this model with Transformers *pipeline* for masked token prediction.
15
  ```python
16
  >>> from transformers import pipeline
17
- >>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-yoruba')
18
- >>> unmasker("Arẹmọ Phillip to jẹ ọkọ [MASK] Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun")
19
-
20
- [{'sequence': '[CLS] Arẹmọ Phillip to jẹ ọkọ Mary Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun [SEP]', 'score': 0.1738305538892746,
21
- 'token': 12176,
22
- 'token_str': 'Mary'},
23
- {'sequence': '[CLS] Arẹmọ Phillip to jẹ ọkọ Queen Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun [SEP]', 'score': 0.16382873058319092,
24
- 'token': 13704,
25
- 'token_str': 'Queen'},
26
- {'sequence': '[CLS] Arẹmọ Phillip to jẹ ọkọ ti Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun [SEP]', 'score': 0.13272495567798615,
27
- 'token': 14382,
28
- 'token_str': 'ti'},
29
- {'sequence': '[CLS] Arẹmọ Phillip to jẹ ọkọ King Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun [SEP]', 'score': 0.12823280692100525,
30
- 'token': 11515,
31
- 'token_str': 'King'},
32
- {'sequence': '[CLS] Arẹmọ Phillip to jẹ ọkọ Lady Elizabeth to ti wa lori aisan ti dagbere faye lẹni ọdun mọkandilọgọrun [SEP]', 'score': 0.07841219753026962,
33
- 'token': 14005,
34
- 'token_str': 'Lady'}]
 
 
35
 
36
  ```
37
  #### Limitations and bias
38
  This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
39
  ## Training data
40
- This model was fine-tuned on Bible, JW300, [Menyo-20k](https://huggingface.co/datasets/menyo20k_mt), [Yoruba Embedding corpus](https://huggingface.co/datasets/yoruba_text_c3) and [CC-Aligned](https://opus.nlpl.eu/), Wikipedia, news corpora (BBC Yoruba, VON Yoruba, Asejere, Alaroye), and other small datasets curated from friends.
41
 
42
  ## Training procedure
43
  This model was trained on a single NVIDIA V100 GPU
@@ -45,9 +47,9 @@ This model was trained on a single NVIDIA V100 GPU
45
  ## Eval results on Test set (F-score)
46
  Dataset|F1-score
47
  -|-
48
- Yoruba GV NER |75.34
49
  MasakhaNER |80.82
50
- BBC Yoruba |80.66
51
 
52
  ### BibTeX entry and citation info
53
  By David Adelani
 
1
  Hugging Face's logo
2
  ---
3
+ language: ha
4
  datasets:
5
 
6
  ---
7
+ # bert-base-multilingual-cased-finetuned-hausa
8
  ## Model description
9
+ **bert-base-multilingual-cased-finetuned-hausa** is a **Hausa BERT** model obtained by fine-tuning **bert-base-multilingual-cased** model on Hausa language texts. It provides **better performance** than the multilingual BERT on text classification and named entity recognition datasets.
10
 
11
+ Specifically, this model is a *bert-base-multilingual-cased* model that was fine-tuned on Hausa corpus.
12
  ## Intended uses & limitations
13
  #### How to use
14
  You can use this model with Transformers *pipeline* for masked token prediction.
15
  ```python
16
  >>> from transformers import pipeline
17
+ >>> unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-hausa')
18
+ >>> unmasker("Shugaban [MASK] Muhammadu Buhari ya amince da shawarar da ma’aikatar sufuri karkashin jagoranci")
19
+
20
+ [{'sequence':
21
+ '[CLS] Shugaban Nigeria Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]',
22
+ 'score': 0.9762618541717529,
23
+ 'token': 22045,
24
+ 'token_str': 'Nigeria'},
25
+ {'sequence': '[CLS] Shugaban Ka Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.007239189930260181,
26
+ 'token': 25444,
27
+ 'token_str': 'Ka'},
28
+ {'sequence': '[CLS] Shugaban, Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.001990817254409194,
29
+ 'token': 117,
30
+ 'token_str': ','},
31
+ {'sequence': '[CLS] Shugaban Ghana Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.001566368737258017,
32
+ 'token': 28682,
33
+ 'token_str': 'Ghana'},
34
+ {'sequence': '[CLS] Shugabanmu Muhammadu Buhari ya amince da shawarar da ma [UNK] aikatar sufuri karkashin jagoranci [SEP]', 'score': 0.0009375187801197171,
35
+ 'token': 11717,
36
+ 'token_str': '##mu'}]
37
 
38
  ```
39
  #### Limitations and bias
40
  This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
41
  ## Training data
42
+ This model was fine-tuned on [Hausa CC-100] (http://data.statmt.org/cc-100/)
43
 
44
  ## Training procedure
45
  This model was trained on a single NVIDIA V100 GPU
 
47
  ## Eval results on Test set (F-score)
48
  Dataset|F1-score
49
  -|-
50
+ Hausa GV NER |80.34
51
  MasakhaNER |80.82
52
+ VOA Hausa Textclass |80.66
53
 
54
  ### BibTeX entry and citation info
55
  By David Adelani