l3cube-pune's picture
Update README.md
1a7f183
metadata
license: cc-by-4.0
language: mr
datasets:
  - L3Cube-MahaCorpus

MahaTweetBERT-Scratch

A base BERT model trained on Marathi Tweets. More details on the dataset, models, and baseline results can be found in our [paper] ( link )

Released under project: https://github.com/l3cube-pune/MarathiNLP

A better version of model is available here: https://huggingface.co/l3cube-pune/marathi-tweets-bert

@article{gokhale2022spread,
  title={Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection},
  author={Gokhale, Omkar and Kane, Aditya and Patankar, Shantanu and Chavan, Tanmay and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2210.04267},
  year={2022}
}

Other Models trained from scratch are listed below:
Marathi-Scratch
Marathi-Tweets-Scratch
Hindi-Scratch
Dev-Scratch
Kannada-Scratch
Telugu-Scratch
Malayalam-Scratch
Gujarati-Scratch

Better versions of Monolingual Indic BERT models are listed below:
Marathi BERT
Marathi RoBERTa
Marathi AlBERT

Hindi BERT
Hindi RoBERTa
Hindi AlBERT

Dev BERT
Dev RoBERTa
Dev AlBERT

Kannada BERT
Telugu BERT
Malayalam BERT
Tamil BERT
Gujarati BERT
Oriya BERT
Bengali BERT
Punjabi BERT