arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
ARABERT COVID-19 Arxiv URL is a pretrained (fine-tuned) version of the AraBERT v2 model (https://huggingface.co/aubmindlab/bert-base-arabertv02). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315). The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
Classification results for multiple tasks including fake-news and hate speech detection when using arabert_c19 and mbert_ar_c19:
For more details refer to the paper (link)
arabert | mbert | distilbert multi | arabert Covid-19 | mbert Covid-19 | |
---|---|---|---|---|---|
Contains hate (Binary) | 0.8346 | 0.6675 | 0.7145 | 0.8649 |
0.8492 |
Talk about a cure (Binary) | 0.8193 | 0.7406 | 0.7127 | 0.9055 | 0.9176 |
News or opinion (Binary) | 0.8987 | 0.8332 | 0.8099 | 0.9163 |
0.9116 |
Contains fake information (Binary) | 0.6415 | 0.5428 | 0.4743 | 0.7739 |
0.7228 |
Preprocessing
from arabert.preprocess import ArabertPreprocessor
model_name="moha/arabert_c19"
arabert_prep = ArabertPreprocessor(model_name=model_name)
text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
arabert_prep.preprocess(text)
Citation
Please cite as:
@misc{ameur2021aracovid19mfh,
title={AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset},
author={Mohamed Seghir Hadj Ameur and Hassina Aliane},
year={2021},
eprint={2105.03143},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Contacts
Hadj Ameur: Github | mohamedhadjameur@gmail.com | mhadjameur@cerist.dz
- Downloads last month
- 414
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.