hiert commited on
Commit
f0c2e6f
1 Parent(s): c8d5f85

Upload 6 files

Browse files
suicide/README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc0-1.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy: 0.939432
7
+ - recall: 0.937164
8
+ - precision: 0.92822
9
+ - f1: 0.92822
10
+ tags:
11
+ - classification
12
+ - suicidality
13
+ - suicidal text detection
14
+ - suicidal sentiment
15
+ - sentiment
16
+ - suicide
17
+ - self harm
18
+ - depression
19
+ pipeline_tag: text-classification
20
+ ---
21
+
22
+
23
+ # Advanced Suicidality Classifier Model
24
+
25
+ ## Introduction
26
+
27
+ Welcome to the Suicidality Detection AI Model! This project aims to provide a machine learning solution for detecting sequences of words indicative of suicidality in text. By utilizing the ELECTRA architecture and fine-tuning on a diverse dataset, we have created a powerful classification model that can distinguish between suicidal and non-suicidal text expressions.
28
+
29
+
30
+ ## Labels
31
+
32
+ The model classifies input text into two labels:
33
+
34
+ - `LABEL_0`: Indicates that the text is non-suicidal.
35
+ - `LABEL_1`: Indicates that the text is indicative of suicidality.
36
+
37
+
38
+ ## Training
39
+
40
+ The model was fine-tuned using the ELECTRA architecture on a carefully curated dataset. Our training process involved cleaning and preprocessing various text sources to create a comprehensive training set. The training results indicate promising performance, with metrics including:
41
+
42
+ ## Performance
43
+
44
+ The model's performance on the validation dataset is as follows:
45
+
46
+ - Accuracy: 0.939432
47
+ - Recall: 0.937164
48
+ - Precision: 0.92822
49
+ - F1 Score: 0.932672
50
+
51
+ These metrics demonstrate the model's ability to accurately classify sequences of text as either indicative of suicidality or non-suicidal.
52
+
53
+
54
+
55
+ ## Data Sources
56
+
57
+ We collected data from multiple sources to create a rich and diverse training dataset:
58
+
59
+ - https://www.kaggle.com/datasets/thedevastator/c-ssrs-labeled-suicidality-in-500-anonymized-red
60
+ - https://www.kaggle.com/datasets/amangoyl/reddit-dataset-for-multi-task-nlp
61
+ - https://www.kaggle.com/datasets/imeshsonu/suicideal-phrases
62
+ - https://raw.githubusercontent.com/laxmimerit/twitter-suicidal-intention-dataset/master/twitter-suicidal_data.csv
63
+ - https://www.kaggle.com/datasets/mohanedmashaly/suicide-notes
64
+ - https://www.kaggle.com/datasets/natalialech/suicidal-ideation-on-twitter
65
+
66
+ The data underwent thorough cleaning and preprocessing before being used for training the model.
67
+
68
+ ## How to Use
69
+
70
+ ### Installation
71
+
72
+ To use the model, you need to install the Transformers library:
73
+
74
+ ```bash
75
+ pip install transformers
76
+ ```
77
+
78
+ ### Using the Model
79
+
80
+ You can utilize the model for text classification using the following code snippets:
81
+
82
+ 1. Using the pipeline approach:
83
+
84
+ ```python
85
+ from transformers import pipeline
86
+
87
+ classifier = pipeline("sentiment-analysis", model="sentinetyd/suicidality")
88
+
89
+ result = classifier("text to classify")
90
+ print(result)
91
+ ```
92
+
93
+ 2. Using the tokenizer and model programmatically:
94
+
95
+ ```python
96
+ from transformers import AutoTokenizer, AutoModel
97
+
98
+ tokenizer = AutoTokenizer.from_pretrained("sentinetyd/suicidality")
99
+ model = AutoModel.from_pretrained("sentinetyd/suicidality")
100
+
101
+ # Perform tokenization and prediction using the tokenizer and model
102
+ ```
103
+
104
+ ## Ethical Considerations
105
+ Suicidality is a sensitive and serious topic. It's important to exercise caution and consider ethical implications when using this model. Predictions made by the model should be handled with care and used to complement human judgment and intervention.
106
+
107
+
108
+ ## Model Credits
109
+
110
+ We would like to acknowledge the "gooohjy/suicidal-electra" model available on Hugging Face's model repository. You can find the model at [this link](https://huggingface.co/gooohjy/suicidal-electra). We used this model as a starting point and fine-tuned it to create our specialized suicidality detection model.
111
+
112
+
113
+ ## Contributions
114
+ We welcome contributions and feedback from the community to further improve the model's performance, enhance the dataset, and ensure its responsible deployment.
suicide/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "gooohjy/suicidal-electra",
3
+ "architectures": [
4
+ "ElectraForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "embedding_size": 768,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "electra",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "problem_type": "single_label_classification",
22
+ "summary_activation": "gelu",
23
+ "summary_last_dropout": 0.1,
24
+ "summary_type": "first",
25
+ "summary_use_proj": true,
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.31.0",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
suicide/gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
suicide/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
suicide/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
suicide/vocab.txt ADDED
The diff for this file is too large to render. See raw diff