Sharpaxis commited on
Commit
0ae3831
1 Parent(s): 47026e2

Create README.md

Browse files

This model is an ethically fine-tuned version of Llama 2, specifically trained to detect and flag private or sensitive information within natural text. It serves as a powerful tool for data privacy and security, capable of identifying potentially vulnerable data such as:

API keys
Personally Identifiable Information (PII)
Financial data
Confidential business information
Login credentials

Key Features:

Analyzes natural language input to identify sensitive content
Provides explanations for detected sensitive information
Helps prevent accidental exposure of private data
Supports responsible data handling practices

Use Cases:

Content moderation
Data loss prevention
Compliance checks for GDPR, HIPAA, etc.
Security audits of text-based communications

This model aims to enhance data protection measures and promote ethical handling of sensitive information in various applications and industries.

Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - synapsecai/synthetic-sensitive-information
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ pipeline_tag: text-classification
10
+ ---
11
+
12
+ Model Information
13
+
14
+ model_name = "NousResearch/Llama-2-7b-chat-hf"
15
+
16
+ dataset_name = "synapsecai/synthetic-sensitive-information"
17
+
18
+
19
+ QLoRA parameters
20
+
21
+ lora_r = 32
22
+
23
+ lora_alpha = 8
24
+
25
+ lora_dropout = 0.1
26
+
27
+
28
+ BitsAndBytes parameters
29
+
30
+ use_4bit = True
31
+
32
+ bnb_4bit_compute_dtype = "float16"
33
+
34
+ bnb_4bit_quant_type = "nf4"
35
+
36
+ use_nested_quant = False
37
+
38
+
39
+ Training Arguments parameters
40
+
41
+ num_train_epochs = 1
42
+
43
+ fp16 = False
44
+
45
+ bf16 = False
46
+
47
+ per_device_train_batch_size = 32
48
+
49
+ per_device_eval_batch_size = 8
50
+
51
+ gradient_accumulation_steps = 4
52
+
53
+ gradient_checkpointing = True
54
+
55
+ max_grad_norm = 0.3
56
+
57
+ learning_rate = 2e-4
58
+
59
+ weight_decay = 0.001
60
+
61
+ optim = "paged_adamw_32bit"
62
+
63
+ lr_scheduler_type = "cosine"
64
+
65
+ max_steps = -1
66
+
67
+ warmup_ratio = 0.03
68
+
69
+ group_by_length = True
70
+
71
+ save_steps = 0
72
+
73
+ logging_steps = 25
74
+
75
+
76
+ SFT parameters
77
+
78
+ max_seq_length = None
79
+
80
+ packing = False