Edit model card

Model Card

Model Description

We fine-tuned this gelectra-large model for four rounds of dynamic adversarial data collection to create the GAHD dataset. In each round annotators created examples by trying to trick the model into a misclassification. We explored different ways of supporting annotators in finding model-tricking examples during the data collection. This is the final model (R4) in our paper. The model classifies text into "hate speech" (1) or "not-hate speech" (0).

Please check out our paper for further details about the training procedure (Appendix C) or evaluation (Section 4).

Citation

When using this model or the GAHD dataset, please cite our preprint on Arxiv:

@misc{goldzycher2024improving,
      title={Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset}, 
      author={Janis Goldzycher and Paul Röttger and Gerold Schneider},
      year={2024},
      eprint={2403.19559},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
4

Datasets used to train jagoldz/gahd