|
--- |
|
language: |
|
- en |
|
datasets: |
|
- Open-Orca/OpenOrca |
|
- jackhhao/jailbreak-classification |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
tags: |
|
- jailbreak |
|
- security |
|
- moderation |
|
- prompt-injection |
|
--- |
|
|
|
# Jailbreak Classifier |
|
|
|
Classifies prompts as jailbreaks or benign. This is a fine-tune checkpoint of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [jailbreak-classification](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Fine-tuned on the [jailbreak-classification](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset. |
|
|
|
### Training Procedure |
|
|
|
#### Training Hyperparameters |
|
|
|
Second fine-tuning hyper-parameters(on train(0.8) and val(0.2)) |
|
- learning_rate = 5e-5 |
|
- train_batch_size = 8 |
|
- eval_batch_size = 8 |
|
- lr_scheduler_type = linear |
|
- num_train_epochs = 5.0 |
|
|
|
Fecond fine-tuning hyper-parameters(on train and test) |
|
- learning_rate = 1e-5 |
|
- train_batch_size = 8 |
|
- eval_batch_size = 8 |
|
- lr_scheduler_type = linear |
|
- num_train_epochs = 3.0 |