--- license: cc-by-nc-sa-4.0 language: - he metrics: - accuracy pipeline_tag: text-classification tags: - code --- ## Hebrew Corpus This corpus contains offensive language in Hebrew manually annotated. The data includes 15,881 tweets, labeled with one or more of five classes (abusive, hate, violence, pornographic, or non-offensive). The corpus is annonated manually by Arabic-Hebrew bilingual speakers. https://arxiv.org/abs/2309.02724 ## Models AlephBERT (https://huggingface.co/imvladikon/sentence-transformers-alephbert) ## Github Repository git clone https://github.com/SinaLab/OffensiveHebrew You can download the data from the following GitGub link: https://github.com/SinaLab/OffensiveHebrew/tree/main/data