mlabonne commited on
Commit
f4e4979
1 Parent(s): 03887e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -54
README.md CHANGED
@@ -22,7 +22,7 @@ datasets:
22
 
23
  # NeuralHermes 2.5 - Mistral 7B - LASER
24
 
25
- This an experimental LASER version of NeuralHermes using [laserRMT](https://github.com/cognitivecomputations/laserRMT).
26
 
27
  | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
28
  |------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
@@ -37,38 +37,82 @@ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v
37
 
38
  The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
39
 
40
- ### Quantized models
41
-
42
- * GGUF: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
43
- * AWQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
44
- * GPTQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
45
- * EXL2:
46
- * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
47
- * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
48
- * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
49
- * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
50
- * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
51
-
52
  ## Results
53
 
54
- **Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
55
-
56
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
57
-
58
- Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
59
-
60
- Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
61
-
62
  ### AGIEval
63
- ![](https://i.imgur.com/7an3B1f.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ### GPT4All
66
- ![](https://i.imgur.com/TLxZFi9.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ### TruthfulQA
69
- ![](https://i.imgur.com/V380MqD.png)
70
-
71
- You can check the Weights & Biases project [here](https://wandb.ai/mlabonne/NeuralHermes-2-5-Mistral-7B/overview?workspace=user-mlabonne).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ## Usage
74
 
@@ -91,7 +135,7 @@ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, toke
91
  # Create pipeline
92
  pipeline = transformers.pipeline(
93
  "text-generation",
94
- model=new_model,
95
  tokenizer=tokenizer
96
  )
97
 
@@ -105,30 +149,4 @@ sequences = pipeline(
105
  max_length=200,
106
  )
107
  print(sequences[0]['generated_text'])
108
- ```
109
-
110
-
111
- ## Training hyperparameters
112
-
113
- **LoRA**:
114
- * r=16
115
- * lora_alpha=16
116
- * lora_dropout=0.05
117
- * bias="none"
118
- * task_type="CAUSAL_LM"
119
- * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
120
-
121
- **Training arguments**:
122
- * per_device_train_batch_size=4
123
- * gradient_accumulation_steps=4
124
- * gradient_checkpointing=True
125
- * learning_rate=5e-5
126
- * lr_scheduler_type="cosine"
127
- * max_steps=200
128
- * optim="paged_adamw_32bit"
129
- * warmup_steps=100
130
-
131
- **DPOTrainer**:
132
- * beta=0.1
133
- * max_prompt_length=1024
134
- * max_length=1536
 
22
 
23
  # NeuralHermes 2.5 - Mistral 7B - LASER
24
 
25
+ This is an experimental LASER version of NeuralHermes using [laserRMT](https://i.imgur.com/gUlEJuU.jpg).
26
 
27
  | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
28
  |------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
 
37
 
38
  The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Results
41
 
 
 
 
 
 
 
 
 
42
  ### AGIEval
43
+ | Task |Version| Metric |Value| |Stderr|
44
+ |------------------------------|------:|--------|----:|---|-----:|
45
+ |agieval_aqua_rat | 0|acc |21.26|± | 2.57|
46
+ | | |acc_norm|22.83|± | 2.64|
47
+ |agieval_logiqa_en | 0|acc |39.32|± | 1.92|
48
+ | | |acc_norm|40.71|± | 1.93|
49
+ |agieval_lsat_ar | 0|acc |25.65|± | 2.89|
50
+ | | |acc_norm|25.65|± | 2.89|
51
+ |agieval_lsat_lr | 0|acc |48.82|± | 2.22|
52
+ | | |acc_norm|50.00|± | 2.22|
53
+ |agieval_lsat_rc | 0|acc |58.36|± | 3.01|
54
+ | | |acc_norm|57.25|± | 3.02|
55
+ |agieval_sat_en | 0|acc |74.27|± | 3.05|
56
+ | | |acc_norm|73.30|± | 3.09|
57
+ |agieval_sat_en_without_passage| 0|acc |43.69|± | 3.46|
58
+ | | |acc_norm|42.23|± | 3.45|
59
+ |agieval_sat_math | 0|acc |37.27|± | 3.27|
60
+ | | |acc_norm|36.36|± | 3.25|
61
+
62
+ Average: 43.54%
63
 
64
  ### GPT4All
65
+ | Task |Version| Metric |Value| |Stderr|
66
+ |-------------|------:|--------|----:|---|-----:|
67
+ |arc_challenge| 0|acc |57.76|± | 1.44|
68
+ | | |acc_norm|60.32|± | 1.43|
69
+ |arc_easy | 0|acc |83.84|± | 0.76|
70
+ | | |acc_norm|81.10|± | 0.80|
71
+ |boolq | 1|acc |86.70|± | 0.59|
72
+ |hellaswag | 0|acc |63.15|± | 0.48|
73
+ | | |acc_norm|82.55|± | 0.38|
74
+ |openbookqa | 0|acc |34.40|± | 2.13|
75
+ | | |acc_norm|45.20|± | 2.23|
76
+ |piqa | 0|acc |81.94|± | 0.90|
77
+ | | |acc_norm|82.97|± | 0.88|
78
+ |winogrande | 0|acc |75.22|± | 1.21|
79
+
80
+ Average: 73.44%
81
 
82
  ### TruthfulQA
83
+ | Task |Version|Metric|Value| |Stderr|
84
+ |-------------|------:|------|----:|---|-----:|
85
+ |truthfulqa_mc| 1|mc1 |37.70|± | 1.70|
86
+ | | |mc2 |55.26|± | 1.52|
87
+
88
+ Average: 55.26%
89
+
90
+ ### Bigbench
91
+ | Task |Version| Metric |Value| |Stderr|
92
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
93
+ |bigbench_causal_judgement | 0|multiple_choice_grade|53.16|± | 3.63|
94
+ |bigbench_date_understanding | 0|multiple_choice_grade|65.31|± | 2.48|
95
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|34.11|± | 2.96|
96
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|27.02|± | 2.35|
97
+ | | |exact_str_match | 0.28|± | 0.28|
98
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|27.80|± | 2.01|
99
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.86|± | 1.51|
100
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|48.33|± | 2.89|
101
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|41.40|± | 2.20|
102
+ |bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
103
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|65.00|± | 1.07|
104
+ |bigbench_ruin_names | 0|multiple_choice_grade|46.21|± | 2.36|
105
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|27.25|± | 1.41|
106
+ |bigbench_snarks | 0|multiple_choice_grade|70.72|± | 3.39|
107
+ |bigbench_sports_understanding | 0|multiple_choice_grade|65.72|± | 1.51|
108
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|30.40|± | 1.46|
109
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.56|± | 1.18|
110
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.09|± | 0.90|
111
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|48.33|± | 2.89|
112
+
113
+ Average: 42.24%
114
+
115
+ Average score: 53.62%
116
 
117
  ## Usage
118
 
 
135
  # Create pipeline
136
  pipeline = transformers.pipeline(
137
  "text-generation",
138
+ model="mlabonne/NeuralHermes-2.5-Mistral-7B-laser",
139
  tokenizer=tokenizer
140
  )
141
 
 
149
  max_length=200,
150
  )
151
  print(sequences[0]['generated_text'])
152
+ ```