jean-paul commited on
Commit
99a2ea5
1 Parent(s): d6cec73

Updated README

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -13,7 +13,7 @@ A Pretrained model on the Kinyarwanda language dataset using a masked language m
13
 
14
  # How to use:
15
 
16
- The model can be used directly with the pipeline for masked language modeling as follows:
17
 
18
  ```
19
  from transformers import pipeline
@@ -29,5 +29,20 @@ the_mask_pipe("Ejo ndikwiga nagize [MASK] baje kunsura.")
29
  {'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
30
  {'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
31
  {'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ```
33
  __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.
13
 
14
  # How to use:
15
 
16
+ 1) The model can be used directly with the pipeline for masked language modeling as follows:
17
 
18
  ```
19
  from transformers import pipeline
29
  {'sequence': 'ejo ndikwiga nagize agahinda baje kunsura.', 'score': 0.0638100653886795, 'token': 3917, 'token_str': 'agahinda'},
30
  {'sequence': 'ejo ndikwiga nagize ubwoba baje kunsura.', 'score': 0.04934622719883919, 'token': 2387, 'token_str': 'ubwoba'},
31
  {'sequence': 'ejo ndikwiga nagizengo baje kunsura.', 'score': 0.02243402972817421, 'token': 455, 'token_str': '##ngo'}]
32
+ ```
33
+
34
+ 2) Direct use from the transformer library to get features using AutoModel
35
+
36
+ ```
37
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
38
+
39
+ tokenizer = AutoTokenizer.from_pretrained("jean-paul/KinyaBERT-large")
40
+
41
+ model = AutoModelForMaskedLM.from_pretrained("jean-paul/KinyaBERT-large")
42
+
43
+ input_text = "Ejo ndikwiga nagize abashyitsi baje kunsura."
44
+ encoded_input = tokenizer(input_text, return_tensors='pt')
45
+ output = model(**encoded_input)
46
+
47
  ```
48
  __Note__: We used the huggingface implementations for pretraining BERT from scratch, both the BERT model and the classes needed to do it.