# Llama3 8B Fine-Tuned for Domain Generation Algorithm Detection This model is a fine-tuned version of Meta's Llama3 8B, specifically adapted for detecting **Domain Generation Algorithms (DGAs)**. DGAs are often used by malware to create dynamic domain names for command-and-control (C&C) servers, making them a critical challenge in cybersecurity. ## Model Description - **Base Model**: Llama3 8B - **Task**: DGA Detection - **Fine-Tuning Approach**: Supervised Fine-Tuning (SFT) with domain-specific data. - **Dataset**: A custom dataset comprising 68 malware families and legitimate domains from the Tranco dataset, with a focus on both arithmetic and word-based DGAs. - **Performance**: - **Accuracy**: 94% - **False Positive Rate (FPR)**: 4% - Excels in detecting hard-to-identify word-based DGAs. This model leverages the extensive semantic understanding of Llama3 to classify domains as either **malicious (DGA-generated)** or **legitimate** with high precision and recall. ## How to Use ```python from transformers import AutoTokenizer, AutoModel # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("your-username/llama3-dga-detector") model = AutoModel.from_pretrained("your-username/llama3-dga-detector") # Example domain classification domain = "example.com" inputs = tokenizer(domain, return_tensors="pt") outputs = model(**inputs) # Process outputs to interpret classification