AnonHB's picture
Create README.md
fa405a0 verified
|
raw
history blame
892 Bytes
metadata
{}

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

image/png

image/png

This model is a Guard Model, specifically designed to classify the safety of LLM conversations.
It is fine-tuned from DeBERTa-v3-large and trained using HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models.
The training process involves knowledge distillation paired with data augmentation, using our HarmAug Generated Dataset.

For more information, please refer to our anonymous github