vibhorag101 commited on
Commit
28c4143
1 Parent(s): 02d93ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -11
README.md CHANGED
@@ -10,15 +10,37 @@ metrics:
10
  - f1
11
  model-index:
12
  - name: roberta-base-suicide-prediction-phr-v2
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
  # vibhorag101/roberta-base-suicide-prediction-phr-v2
20
 
21
- This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
  - Loss: 0.0553
24
  - Accuracy: 0.9869
@@ -27,21 +49,24 @@ It achieves the following results on the evaluation set:
27
  - F1: 0.9875
28
 
29
  ## Model description
30
-
31
- More information needed
32
-
33
- ## Intended uses & limitations
34
-
35
- More information needed
36
 
37
  ## Training and evaluation data
38
-
39
- More information needed
 
 
 
 
 
 
 
 
40
 
41
  ## Training procedure
 
42
 
43
  ### Training hyperparameters
44
-
45
  The following hyperparameters were used during training:
46
  - learning_rate: 2e-05
47
  - train_batch_size: 16
@@ -51,6 +76,12 @@ The following hyperparameters were used during training:
51
  - lr_scheduler_type: linear
52
  - lr_scheduler_warmup_ratio: 0.06
53
  - num_epochs: 3
 
 
 
 
 
 
54
 
55
  ### Training results
56
 
 
10
  - f1
11
  model-index:
12
  - name: roberta-base-suicide-prediction-phr-v2
13
+ results:
14
+ - task:
15
+ type: text-classification
16
+ name: Suicidal Tendency Prediction in text
17
+ dataset:
18
+ type: vibhorag101/phr_suicide_prediction_dataset_clean_light
19
+ name: Suicide Prediction Dataset
20
+ split: val
21
+ metrics:
22
+ - type: accuracy
23
+ value: 0.9869
24
+ - type: f1
25
+ value: 0.9875
26
+ - type: recall
27
+ value: 0.9846
28
+ - type: precision
29
+ value: 0.9904
30
+ datasets:
31
+ - vibhorag101/phr_suicide_prediction_dataset_clean_light
32
+ language:
33
+ - en
34
+ library_name: transformers
35
  ---
36
 
37
+
38
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
39
  should probably proofread and complete it, then remove this comment. -->
40
 
41
  # vibhorag101/roberta-base-suicide-prediction-phr-v2
42
 
43
+ This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on [Suicide Prediction Dataset](https://huggingface.co/datasets/vibhorag101/phr_suicide_prediction_dataset_clean_light), sourced from Reddit.
44
  It achieves the following results on the evaluation set:
45
  - Loss: 0.0553
46
  - Accuracy: 0.9869
 
49
  - F1: 0.9875
50
 
51
  ## Model description
52
+ This model is a finetune of roberta-base to detect suicidal tendencies in a given text.
 
 
 
 
 
53
 
54
  ## Training and evaluation data
55
+ - The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
56
+ - The dataset contains text with binary labels for suicide or non-suicide.
57
+ - The dataset was cleaned minimally, as BERT depends on contextually sensitive information, which can worsely effect its performance.
58
+ - Removed numbers
59
+ - Removed URLs, Emojis, and accented characters.
60
+ - Remove any extra white spaces and any extra spaces after a single space.
61
+ - Removed any consecutive characters repeated more than 3 times.
62
+ - The rows with more than 512 BERT Tokens were removed, as they exceeded BERT's max token.
63
+ - The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/phr_suicide_prediction_dataset_clean_light)
64
+ - The evaluation set had ~33k samples, while the training set had ~153k samples, i.e., a 70:15:15 (train:test:val) split.
65
 
66
  ## Training procedure
67
+ - The model was trained on an RTXA5000 GPU.
68
 
69
  ### Training hyperparameters
 
70
  The following hyperparameters were used during training:
71
  - learning_rate: 2e-05
72
  - train_batch_size: 16
 
76
  - lr_scheduler_type: linear
77
  - lr_scheduler_warmup_ratio: 0.06
78
  - num_epochs: 3
79
+ - eval_steps: 500
80
+ - save_steps: 500
81
+ - Early Stopping:
82
+ - early_stopping_patience: 5
83
+ - early_stopping_threshold: 0.001
84
+ - parameter: F1 Score
85
 
86
  ### Training results
87