mshenoda commited on
Commit
bf36083
1 Parent(s): 3b92a10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -46,6 +46,12 @@ The dataset is composed of messages labeled by ham or spam, merged from three da
46
  The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
47
  The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
48
 
 
 
 
 
 
 
49
 
50
  ## Architecture
51
  The model is fine tuned RoBERTa
 
46
  The prepare script for enron is available at https://github.com/mshenoda/roberta-spam/tree/main/data/enron.
47
  The data is split 80% train 10% validation, and 10% test sets; the scripts used to split and merge of the three data sources are available at: https://github.com/mshenoda/roberta-spam/tree/main/data/utils.
48
 
49
+ ### Dataset Class Distribution
50
+
51
+ Training 80% | Validation 10% | Testing 10%
52
+ :-------------------------:|:-------------------------:|:-------------------------:
53
+ ![](plots/train_set_distribution.jpg "Train / Validation Loss") Class Distribution | ![](plots/val_set_distribution.jpg "Class Distribution") Class Distribution | ![](plots/test_set_distribution.jpg "Class Distribution") Class Distribution
54
+
55
 
56
  ## Architecture
57
  The model is fine tuned RoBERTa