johnhandleyd commited on
Commit
d4727c5
·
verified ·
1 Parent(s): 20d0198

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -11
README.md CHANGED
@@ -3,29 +3,37 @@ license: mit
3
  base_model: TheBloke/zephyr-7B-alpha-GPTQ
4
  tags:
5
  - generated_from_trainer
 
 
6
  model-index:
7
- - name: thesa_v1
8
  results: []
 
 
 
 
 
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- # thesa_v1
15
-
16
- This model is a fine-tuned version of [TheBloke/zephyr-7B-alpha-GPTQ](https://huggingface.co/TheBloke/zephyr-7B-alpha-GPTQ) on an unknown dataset.
17
 
18
  ## Model description
19
 
20
- More information needed
 
 
 
21
 
22
  ## Intended uses & limitations
23
 
24
- More information needed
25
 
26
- ## Training and evaluation data
27
 
28
- More information needed
 
29
 
30
  ## Training procedure
31
 
@@ -33,14 +41,20 @@ More information needed
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0002
 
36
  - train_batch_size: 8
37
  - eval_batch_size: 8
38
- - seed: 42
 
39
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
  - lr_scheduler_type: cosine
41
  - lr_scheduler_warmup_ratio: 0.1
42
  - num_epochs: 10
43
  - mixed_precision_training: Native AMP
 
 
 
 
44
 
45
  ### Framework versions
46
 
@@ -48,3 +62,9 @@ The following hyperparameters were used during training:
48
  - Pytorch 2.1.0+cu121
49
  - Datasets 2.16.1
50
  - Tokenizers 0.15.1
 
 
 
 
 
 
 
3
  base_model: TheBloke/zephyr-7B-alpha-GPTQ
4
  tags:
5
  - generated_from_trainer
6
+ - gptq
7
+ - peft
8
  model-index:
9
+ - name: thesa
10
  results: []
11
+ datasets:
12
+ - loaiabdalslam/counselchat
13
+ language:
14
+ - en
15
+ pipeline_tag: text-generation
16
  ---
17
 
18
+ # Thesa: A Therapy Chatbot 👩🏻‍⚕️
 
19
 
20
+ Thesa is an experimental project of a therapy chatbot trained on mental health data and fine-tuned with the Zephyr GPTQ model that uses quantization to decrease high computatinal and storage costs.
 
 
21
 
22
  ## Model description
23
 
24
+ - Model type: A fine-tuned version of Zephyr 7B Alpha - GPTQ on various mental health datasets
25
+ - Language(s): English
26
+ - License: MIT
27
+ - Fine-tuned from: [TheBloke/zephyr-7B-alpha-GPTQ](https://huggingface.co/TheBloke/zephyr-7B-alpha-GPTQ)
28
 
29
  ## Intended uses & limitations
30
 
31
+ This model is purely experimental and should not be used as substitute for a mental health professional.
32
 
33
+ ## Training evaluation
34
 
35
+ Training loss:
36
+ <img src="imgs/loss_27.2.24.png" alt="loss" width="550"/>
37
 
38
  ## Training procedure
39
 
 
41
 
42
  The following hyperparameters were used during training:
43
  - learning_rate: 0.0002
44
+ - warmup_ratio: 0.1
45
  - train_batch_size: 8
46
  - eval_batch_size: 8
47
+ - gradient_accumulation_steps: 1
48
+ - seed: 35
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
52
  - num_epochs: 10
53
  - mixed_precision_training: Native AMP
54
+ - fp16: True
55
+
56
+ Learning rate overtime (warm up ratio was used during training):
57
+ <img src="imgs/lr_27.2.24.png" alt="lr" width="550"/>
58
 
59
  ### Framework versions
60
 
 
62
  - Pytorch 2.1.0+cu121
63
  - Datasets 2.16.1
64
  - Tokenizers 0.15.1
65
+ - Accelerate 0.27.2
66
+ - PEFT 0.8.2
67
+ - Auto-GPTQ 0.6.0
68
+ - TRL 0.7.11
69
+ - Optimum 1.17.1
70
+ - Bitsandbytes 0.42.0