migueladarlo commited on
Commit
77f6485
1 Parent(s): 466bbd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -1
README.md CHANGED
@@ -1,3 +1,74 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: mit # Example: apache-2.0 or any license from https://huggingface.co/docs/hub/model-repos#list-of-license-identifiers
5
+ tags:
6
+ - text # Example: audio
7
+ - Twitter
8
+ datasets:
9
+ - CLPsych 2015 # Example: common_voice. Use dataset id from https://hf.co/datasets
10
+ metrics:
11
+ - accuracy, f1, precision, recall, AUC # Example: wer. Use metric id from https://hf.co/metrics
12
+
13
+ model-index:
14
+ - name: distilbert-depression-mixed
15
+ results: []
16
  ---
17
+
18
+ # distilbert-depression-mixed
19
+
20
+ This model is a fine-tuned version of [base-uncased](https://huggingface.co/distilbert-base-uncased) trained on CLPsych 2015 and a scraped dataset, and evaluated on a scraped dataset from Twitter.
21
+ It achieves the following results on the evaluation set:
22
+ - Evaluation Loss: 0.71
23
+ - Accuracy: 0.63
24
+ - F1: 0.59
25
+ - Precision: 0.66
26
+ - Recall: 0.53
27
+ - AUC: 0.63
28
+
29
+
30
+ ## Intended uses & limitations
31
+
32
+ Feed a corpus of tweets to the model to generate label if input is indicative of depression or not. Label 1 is depression, Label 0 is not.
33
+
34
+ Limitation: All token sequences longer than 512 are automatically truncated. Also, training and test data may be contaminated with mislabeled users.
35
+
36
+ ### How to use
37
+
38
+ You can use this model directly with a pipeline for sentiment analysis:
39
+
40
+ ```python
41
+ >>> from transformers import DistilBertTokenizerFast, AutoTokenizer
42
+ >>> tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
43
+ >>> from transformers import DistilBertForSequenceClassification
44
+ >>> model = DistilBertForSequenceClassification.from_pretrained(r"distilbert-depression-mixed")
45
+ >>> from transformers import pipeline
46
+ >>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
47
+ >>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
48
+ >>> result=classifier('pain peko',**tokenizer_kwargs) #For truncation to apply in the pipeline
49
+
50
+
51
+ [{'label': 'LABEL_1', 'score': 0.5048992037773132}]
52
+ ```
53
+
54
+ Otherwise, download the files and specify within the pipeline the path to the folder that contains the config.json, pytorch_model.bin, and training_args.bin
55
+
56
+ ## Training hyperparameters
57
+
58
+ The following hyperparameters were used during training:
59
+ - learning_rate: 4.19e-05
60
+ - train_batch_size: 16
61
+ - eval_batch_size: 16
62
+ - weight_decay: 0.06
63
+ - num_epochs: 5.0
64
+
65
+ ## Training results
66
+
67
+
68
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall | AUC |
69
+ |:-----:|:-------------:|:---------------:|:--------:|:--------:|:---------:|:--------:|:--------:|
70
+ | 1.0 | 0.68 | 0.66 | 0.61 | 0.54 | 0.60 | 0.50 | 0.60 |
71
+ | 2.0 | 0.65 | 0.65 | 0.63 | 0.49 | 0.70 | 0.37 | 0.62 |
72
+ | 3.0 | 0.53 | 0.63 | 0.66 | 0.58 | 0.69 | 0.50 | 0.65 |
73
+ | 4.0 | 0.39 | 0.66 | 0.67 | 0.61 | 0.69 | 0.54 | 0.67 |
74
+ | 5.0 | 0.27 | 0.72 | 0.65 | 0.61 | 0.63 | 0.60 | 0.64 |