ToluClassics commited on
Commit
c8540f5
1 Parent(s): 475acc8

update readme

Browse files
Files changed (1) hide show
  1. readme.md +20 -0
readme.md CHANGED
@@ -39,6 +39,26 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Training Procedure
43
 
44
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
 
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
42
+ ## Intended uses & limitations
43
+
44
+ `afriteva_small` is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
45
+
46
+ ```python
47
+ >>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
48
+
49
+ >>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_small")
50
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_small")
51
+
52
+ >>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
53
+ >>> tgt_text = "Would you like to be?"
54
+
55
+ >>> model_inputs = tokenizer(src_text, return_tensors="pt")
56
+ >>> with tokenizer.as_target_tokenizer():
57
+ labels = tokenizer(tgt_text, return_tensors="pt").input_ids
58
+
59
+ >>> model(**model_inputs, labels=labels) # forward pass
60
+ ```
61
+
62
  ## Training Procedure
63
 
64
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)