FredZhang7
commited on
Commit
•
2aa9ee7
1
Parent(s):
d724999
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,33 @@
|
|
1 |
---
|
2 |
-
license: cc-by-
|
|
|
|
|
3 |
wget:
|
4 |
-
- text:
|
5 |
-
- text:
|
6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
dataset:
|
4 |
+
- FredZhang7/malicious-website-features-2.4M
|
5 |
wget:
|
6 |
+
- text: https://chat.openai.com/
|
7 |
+
- text: https://huggingface.co/FredZhang7/aivance-safesearch-v3
|
8 |
---
|
9 |
+
|
10 |
+
|
11 |
+
The classification task is split into two stages:
|
12 |
+
1. URL features model
|
13 |
+
- 96.5%+ accuracy on training and validation data
|
14 |
+
- 2,436,727 rows of labelled URLs
|
15 |
+
2. Website features model
|
16 |
+
- 98.2% on training data, 98.7% accuracy on validation
|
17 |
+
- 911,180 rows of 11 features
|
18 |
+
|
19 |
+
|
20 |
+
## URL Features
|
21 |
+
```python
|
22 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
23 |
+
tokenizer = AutoTokenizer.from_pretrained("FredZhang7/malware-phisher")
|
24 |
+
model = AutoModelForSequenceClassification.from_pretrained("FredZhang7/malware-phisher")
|
25 |
+
```
|
26 |
+
## Website Features
|
27 |
+
```bash
|
28 |
+
pip install lightgbm
|
29 |
+
```
|
30 |
+
```python
|
31 |
+
import lightgbm as lgb
|
32 |
+
lgb.Booster(model_file="malicious_features_combined.txt")
|
33 |
+
```
|