castorini
/

afriteva_small

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ToluClassics commited on May 24, 2022

Commit

c8540f5

•

1 Parent(s): 475acc8

update readme

Files changed (1) hide show

readme.md +20 -0

readme.md CHANGED Viewed

@@ -39,6 +39,26 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)

 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
+## Intended uses & limitations
+`afriteva_small` is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
+```python
+>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_small")
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_small")
+>>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
+>>> tgt_text =  "Would you like to be?"
+>>> model_inputs = tokenizer(src_text, return_tensors="pt")
+>>> with tokenizer.as_target_tokenizer():
+        labels = tokenizer(tgt_text, return_tensors="pt").input_ids
+>>> model(**model_inputs, labels=labels) # forward pass
+```
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)