FredZhang7 commited on
Commit
2aa9ee7
1 Parent(s): d724999

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -3
README.md CHANGED
@@ -1,6 +1,33 @@
1
  ---
2
- license: cc-by-nd-4.0
 
 
3
  wget:
4
- - text: "https://chat.openai.com/"
5
- - text: "https://huggingface.co/FredZhang7/aivance-safesearch-v3"
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ dataset:
4
+ - FredZhang7/malicious-website-features-2.4M
5
  wget:
6
+ - text: https://chat.openai.com/
7
+ - text: https://huggingface.co/FredZhang7/aivance-safesearch-v3
8
  ---
9
+
10
+
11
+ The classification task is split into two stages:
12
+ 1. URL features model
13
+ - 96.5%+ accuracy on training and validation data
14
+ - 2,436,727 rows of labelled URLs
15
+ 2. Website features model
16
+ - 98.2% on training data, 98.7% accuracy on validation
17
+ - 911,180 rows of 11 features
18
+
19
+
20
+ ## URL Features
21
+ ```python
22
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
23
+ tokenizer = AutoTokenizer.from_pretrained("FredZhang7/malware-phisher")
24
+ model = AutoModelForSequenceClassification.from_pretrained("FredZhang7/malware-phisher")
25
+ ```
26
+ ## Website Features
27
+ ```bash
28
+ pip install lightgbm
29
+ ```
30
+ ```python
31
+ import lightgbm as lgb
32
+ lgb.Booster(model_file="malicious_features_combined.txt")
33
+ ```