mshenoda commited on
Commit
45ab82d
1 Parent(s): 9e6d1e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -6,9 +6,10 @@ Spam messages frequently carry malicious links or phishing attempts posing signi
6
 
7
  ## Dataset
8
  The dataset is composed of messages labeled by ham or spam, merged from three data sources:
9
- 1. SMS Spam Collection https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
10
- 2. Telegram Spam Ham https://huggingface.co/datasets/thehamkercat/telegram-spam-ham/tree/main
11
- 3. Enron Spam: https://huggingface.co/datasets/SetFit/enron_spam/tree/main (only used message column and labels)
 
12
 
13
  The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
14
  The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
 
6
 
7
  ## Dataset
8
  The dataset is composed of messages labeled by ham or spam, merged from three data sources:
9
+
10
+ 1. SMS Spam Collection https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
11
+ 2. Telegram Spam Ham https://huggingface.co/datasets/thehamkercat/telegram-spam-ham/tree/main
12
+ 3. Enron Spam: https://huggingface.co/datasets/SetFit/enron_spam/tree/main (only used message column and labels)
13
 
14
  The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
15
  The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.