Pretrained T5 base language model for Malay and Indonesian.
t5-base-bahasa-cased model was pretrained on multiple tasks. Below is list of tasks we trained on,
Preprocessing steps can reproduce from here, Malaya/pretrained-model/preprocess.
You can use this model by installing
tensorflow and Huggingface library
transformers. And you can use it directly by initializing it like this:
from transformers import T5Tokenizer, T5Model model = T5Model.from_pretrained('huseinzol05/t5-base-bahasa-cased') tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased')
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained('huseinzol05/t5-base-bahasa-cased') model = T5ForConditionalGeneration.from_pretrained('huseinzol05/t5-base-bahasa-cased') input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt') outputs = model.generate(input_ids) print(tokenizer.decode(outputs))
For further details on the model performance, simply checkout accuracy page from Malaya, https://malaya.readthedocs.io/en/latest/Accuracy.html, we compared with traditional models.