Unable to determine this modelβs pipeline type. Check the
docs
.
Pretrained T5 base language model for Malay and Indonesian.
t5-base-bahasa-cased
model was pretrained on multiple tasks. Below is list of tasks we trained on,
Preprocessing steps can reproduce from here, Malaya/pretrained-model/preprocess.
You can use this model by installing torch
or tensorflow
and Huggingface library transformers
. And you can use it directly by initializing it like this:
from transformers import T5Tokenizer, T5Model
model = T5Model.from_pretrained('huseinzol05/t5-base-bahasa-cased')
tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased')
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased')
model = T5ForConditionalGeneration.from_pretrained('huseinzol05/t5-base-bahasa-cased')
input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Output is,
'Mahathir Mohamad'
For further details on the model performance, simply checkout accuracy page from Malaya, https://malaya.readthedocs.io/en/latest/Accuracy.html, we compared with traditional models.
Thanks to Im Big, LigBlou, Mesolitica and KeyReply for sponsoring AWS, Google and GPU clouds to train T5 for Bahasa.