mshenoda commited on
Commit
9e6d1e5
1 Parent(s): 100ff47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -2,4 +2,22 @@
2
  license: mit
3
  ---
4
  # RoBERTa based Spam Message Detection
5
- Spam messages frequently carry malicious links or phishing attempts posing significant threats to both organizations and their users. By choosing our RoBERTa-based spam message detection system, organizations can greatly enhance their security infrastructure. Our system effectively detects and filters out spam messages, adding an extra layer of security that safeguards organizations against potential financial losses, legal consequences, and reputational harm.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
  # RoBERTa based Spam Message Detection
5
+ Spam messages frequently carry malicious links or phishing attempts posing significant threats to both organizations and their users. By choosing our RoBERTa-based spam message detection system, organizations can greatly enhance their security infrastructure. Our system effectively detects and filters out spam messages, adding an extra layer of security that safeguards organizations against potential financial losses, legal consequences, and reputational harm.
6
+
7
+ ## Dataset
8
+ The dataset is composed of messages labeled by ham or spam, merged from three data sources:
9
+ 1. SMS Spam Collection https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
10
+ 2. Telegram Spam Ham https://huggingface.co/datasets/thehamkercat/telegram-spam-ham/tree/main
11
+ 3. Enron Spam: https://huggingface.co/datasets/SetFit/enron_spam/tree/main (only used message column and labels)
12
+
13
+ The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
14
+ The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
15
+
16
+
17
+ ## Architecture
18
+ The model is fine tuned RoBERTa https://arxiv.org/abs/1907.11692
19
+
20
+ ## Code
21
+
22
+ https://github.com/mshenoda/roberta-spam
23
+