--- language: - Tamil tags: - Tamil-Tokenizer - Tamil-language-model license: "apache-2.0" datasets: - oscar --- # tokenizer - BPE 30_522 vocab size ## model - Roberta trained using MLM OSCAR dataset train data size 5000 lines olly