julien-c HF staff commited on
Commit
6788a44
1 Parent(s): ca6ed5a

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/MoseliMotsoehli/zuBERTa/README.md

Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zu
3
+ ---
4
+
5
+ # zuBERTa
6
+ zuBERTa is a RoBERTa style transformer language model trained on zulu text.
7
+
8
+ ## Intended uses & limitations
9
+ The model can be used for getting embeddings to use on a down-stream task such as question answering.
10
+
11
+ #### How to use
12
+
13
+ ```python
14
+ >>> from transformers import pipeline
15
+ >>> from transformers import AutoTokenizer, AutoModelWithLMHead
16
+
17
+ >>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
18
+ >>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
19
+ >>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
20
+ >>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
21
+
22
+ [
23
+ {
24
+ "sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
25
+ "score": 0.050459690392017365,
26
+ "token": 555,
27
+ "token_str": "Ġkhona"
28
+ },
29
+ {
30
+ "sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
31
+ "score": 0.03668094798922539,
32
+ "token": 2321,
33
+ "token_str": "Ġinkosi"
34
+ },
35
+ {
36
+ "sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
37
+ "score": 0.028774697333574295,
38
+ "token": 5101,
39
+ "token_str": "Ġubukhosi"
40
+ }
41
+ ]
42
+ ```
43
+
44
+ ## Training data
45
+
46
+ 1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings.
47
+ 2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
48
+
49
+ ### BibTeX entry and citation info
50
+
51
+ ```bibtex
52
+ @inproceedings{author = {Moseli Motsoehli},
53
+ title = {Towards transformation of Southern African language models through transformers.},
54
+ year={2020}
55
+ }
56
+ ```