xlnet-tiny-bahasa-cased

Pretrained XLNET tiny language model for Malay.

Pretraining Corpus

xlnet-tiny-bahasa-cased model was pretrained on ~1.4 Billion words. Below is list of data we trained on,

  1. cleaned local texts.
  2. translated The Pile.

Pretraining details

Load Pretrained Model

You can use this model by installing torch or tensorflow and Huggingface library transformers. And you can use it directly by initializing it like this:

from transformers import XLNetModel, XLNetTokenizer

model = XLNetModel.from_pretrained('malay-huggingface/xlnet-tiny-bahasa-cased')
tokenizer = XLNetTokenizer.from_pretrained(
    'malay-huggingface/xlnet-tiny-bahasa-cased',
    do_lower_case = False,
)
Downloads last month
10
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .