ToluClassics commited on
Commit
6760505
1 Parent(s): c8540f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -39,6 +39,21 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Training Procedure
43
 
44
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
 
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
42
+ ## Intended uses & limitations
43
+ `afriteva_small` is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
44
+
45
+ ```python
46
+ >>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
47
+ >>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_small")
48
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_small")
49
+ >>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
50
+ >>> tgt_text = "Would you like to be?"
51
+ >>> model_inputs = tokenizer(src_text, return_tensors="pt")
52
+ >>> with tokenizer.as_target_tokenizer():
53
+ labels = tokenizer(tgt_text, return_tensors="pt").input_ids
54
+ >>> model(**model_inputs, labels=labels) # forward pass
55
+ ```
56
+
57
  ## Training Procedure
58
 
59
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)