Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,34 @@ metrics:
|
|
7 |
- recall
|
8 |
base_model:
|
9 |
- meta-llama/Meta-Llama-3-8B
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- recall
|
8 |
base_model:
|
9 |
- meta-llama/Meta-Llama-3-8B
|
10 |
+
---# Llama3 8B Fine-Tuned for Domain Generation Algorithm Detection
|
11 |
+
|
12 |
+
This model is a fine-tuned version of Meta's Llama3 8B, specifically adapted for detecting **Domain Generation Algorithms (DGAs)**. DGAs are often used by malware to create dynamic domain names for command-and-control (C&C) servers, making them a critical challenge in cybersecurity.
|
13 |
+
|
14 |
+
## Model Description
|
15 |
+
|
16 |
+
- **Base Model**: Llama3 8B
|
17 |
+
- **Task**: DGA Detection
|
18 |
+
- **Fine-Tuning Approach**: Supervised Fine-Tuning (SFT) with domain-specific data.
|
19 |
+
- **Dataset**: A custom dataset comprising 68 malware families and legitimate domains from the Tranco dataset, with a focus on both arithmetic and word-based DGAs.
|
20 |
+
- **Performance**:
|
21 |
+
- **Accuracy**: 94%
|
22 |
+
- **False Positive Rate (FPR)**: 4%
|
23 |
+
- Excels in detecting hard-to-identify word-based DGAs.
|
24 |
+
|
25 |
+
This model leverages the extensive semantic understanding of Llama3 to classify domains as either **malicious (DGA-generated)** or **legitimate** with high precision and recall.
|
26 |
+
|
27 |
+
## How to Use
|
28 |
+
|
29 |
+
```python
|
30 |
+
from transformers import AutoTokenizer, AutoModel
|
31 |
+
|
32 |
+
# Load the tokenizer and model
|
33 |
+
tokenizer = AutoTokenizer.from_pretrained("your-username/llama3-dga-detector")
|
34 |
+
model = AutoModel.from_pretrained("your-username/llama3-dga-detector")
|
35 |
+
|
36 |
+
# Example domain classification
|
37 |
+
domain = "example.com"
|
38 |
+
inputs = tokenizer(domain, return_tensors="pt")
|
39 |
+
outputs = model(**inputs)
|
40 |
+
# Process outputs to interpret classification
|