|
Llama-3.1-8B-Instruct-Secure |
|
Repository: SanjanaCodes/Llama-3.1-8B-Instruct-Secure |
|
License: Add License Here |
|
Languages: English (or specify other supported languages) |
|
Base Model: Llama-3.1-8B (or specify if different) |
|
Library Name: transformers, PyTorch (Add library used) |
|
Pipeline Tag: text-generation |
|
|
|
Model Description |
|
The Llama-3.1-8B-Instruct-Secure is a fine-tuned variant of the Llama-3.1-8B model designed to address LLM security vulnerabilities while maintaining strong performance for instruction-based tasks. It is optimized to handle: |
|
|
|
Secure Prompt Handling: Resistant to common jailbreak and adversarial attacks. |
|
Instruction Following: Retains instruction-based generation accuracy. |
|
Safety and Robustness: Improved safeguards against harmful or unsafe outputs. |
|
Key Features: |
|
Fine-tuned for secure instruction-based generation tasks. |
|
Includes defense mechanisms against adversarial and jailbreaking prompts. |
|
Pre-trained on a mixture of secure and adversarial datasets to generalize against threats. |
|
Usage |
|
Installation |
|
bash |
|
Copy code |
|
pip install transformers torch |
|
Example |
|
python |
|
Copy code |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "SanjanaCodes/Llama-3.1-8B-Instruct-Secure" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Example Input |
|
input_text = "Explain the importance of cybersecurity in simple terms." |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
|
# Generate Response |
|
output = model.generate(**inputs, max_length=150) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
Training Details |
|
Dataset |
|
Fine-tuned on a curated dataset with: |
|
Instruction-following data. |
|
Security-focused prompts. |
|
Adversarial prompts for robustness. |
|
Training Procedure |
|
Framework: PyTorch |
|
Hardware: GPU-enabled nodes |
|
Optimization Techniques: |
|
Mixed Precision Training |
|
Gradient Checkpointing |
|
Evaluation Metrics: Attack Success Rate (ASR), Robustness Score |