Edit model card

Hebrew Corpus

This corpus contains offensive language in Hebrew manually annotated. The data includes 15,881 tweets, labeled with one or more of five classes (abusive, hate, violence, pornographic, or non-offensive). The corpus is annonated manually by Arabic-Hebrew bilingual speakers.

Model Download

Huggingface: https://huggingface.co/SinaLab/OffensiveHebrew

Models

Best model (main branch: https://huggingface.co/SinaLab/OffensiveHebrew/tree/main), with micro-F1 score of 0.82

AlephBERT (AlephBERT branch: https://huggingface.co/SinaLab/OffensiveHebrew/tree/AlephBERT) consists of eight models trained on eight different datasets described in the paper.

HeBERT (HeBERT branch: https://huggingface.co/SinaLab/OffensiveHebrew/tree/HeBERT) consists of eight models trained on eight different datasets described in the paper.

Github Repository

git clone https://github.com/SinaLab/OffensiveHebrew

You can download the data from the following GitGub link:

https://github.com/SinaLab/OffensiveHebrew/tree/main/data

Downloads last month
0
Hosted inference API

Unable to determine this model’s library. Check the docs .