jackhhao commited on
Commit
dc01ec4
1 Parent(s): 495316a

Update model card

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -1,3 +1,37 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ datasets:
6
+ - Open-Orca/OpenOrca
7
+ - jackhhao/jailbreak-classification
8
+ metrics:
9
+ - accuracy
10
+ library_name: transformers
11
+ pipeline_tag: text-classification
12
+ tags:
13
+ - jailbreak
14
+ - security
15
+ - moderation
16
  ---
17
+
18
+ # Jailbreak Classifier
19
+
20
+ Classifies prompts as jailbreaks or benign. This is a fine-tune checkpoint of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [jailbreak-classification](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataaset.
21
+
22
+ ## Training Details
23
+
24
+ ### Training Data
25
+
26
+ Fine-tuned on the [jailbreak-classification](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataaset.
27
+
28
+ ### Training Procedure
29
+
30
+ #### Training Hyperparameters
31
+
32
+ Fine-tuning hyper-parameters:
33
+ - learning_rate = 5e-5
34
+ - per_device_train_batch_size = 8
35
+ - per_device_eval_batch_size = 8
36
+ - lr_scheduler_type = linear
37
+ - num_train_epochs = 5.0