Suicidal-BERT

This text classification model predicts whether a sequence of words are suicidal (1) or non-suicidal (0).

Data

The model was trained on the Suicide and Depression Dataset obtained from Kaggle. The dataset was scraped from Reddit and consists of 232,074 rows equally distributed between 2 classes - suicide and non-suicide.

Parameters

The model fine-tuning was conducted on 1 epoch, with batch size of 6, and learning rate of 0.00001. Due to limited computing resources and time, we were unable to scale up the number of epochs and batch size.

Performance

The model has achieved the following results after fine-tuning on the aforementioned dataset:

Accuracy: 0.9757
Recall: 0.9669
Precision: 0.9701
F1 Score: 0.9685

How to Use

Load the model via the transformers library:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("gooohjy/suicidal-bert")
model = AutoModel.from_pretrained("gooohjy/suicidal-bert")

Resources

For more resources, including the source code, please refer to the GitHub repository gohjiayi/suicidal-text-detection.