marathi-roberta / README.md
l3cube-pune's picture
Update README.md
acabd0b
metadata
license: cc-by-4.0
language: mr
datasets:
  - L3Cube-MahaCorpus

MahaRoBERTa

MahaRoBERTa is a Marathi RoBERTa model. It is a multilingual RoBERTa (xlm-roberta-base) model fine-tuned on L3Cube-MahaCorpus and other publicly available Marathi monolingual datasets. [dataset link] (https://github.com/l3cube-pune/MarathiNLP)

More details on the dataset, models, and baseline results can be found in our [paper] (https://arxiv.org/abs/2202.01159)

@InProceedings{joshi:2022:WILDRE6,
  author    = {Joshi, Raviraj},
  title     = {L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources},
  booktitle      = {Proceedings of The WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {97--101}
}

Other Monolingual Indic BERT models are listed below:
Marathi BERT
Marathi RoBERTa
Marathi AlBERT

Hindi BERT
Hindi RoBERTa
Hindi AlBERT

Dev BERT
Dev RoBERTa
Dev AlBERT

Kannada BERT
Telugu BERT
Malayalam BERT
Tamil BERT
Gujarati BERT
Oriya BERT
Bengali BERT
Punjabi BERT
Assamese BERT