Edit model card
YAML Metadata Error: "language" with value "ta-en" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

This model is used to detect abusive speech in Code-Mixed Tamil. It is finetuned on MuRIL model using Code-Mixed Tamil abusive speech dataset. The model is trained with learning rates of 2e-5. Training code can be found at this url

LABEL_0 :-> Normal

LABEL_1 :-> Abusive

For more details about our paper

Mithun Das, Somnath Banerjee and Animesh Mukherjee. "Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages". Accepted at ACM HT 2022.

Please cite our paper in any published work that uses any of these resources.

@article{das2022data,
  title={Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages},
  author={Das, Mithun and Banerjee, Somnath and Mukherjee, Animesh},
  journal={arXiv preprint arXiv:2204.12543},
  year={2022}
}
Downloads last month
22,513
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.