ArianAskari commited on
Commit
10017fe
1 Parent(s): 1c42183

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -1
README.md CHANGED
@@ -1,4 +1,24 @@
1
- A variation of NeuralHermes 2.5 - Mistral 7B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  This is a variation of NeuralHermes which is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the mlabonne/chatml_dpo_pairs dataset. It surpasses the original model on most benchmarks (see results).
4
 
@@ -6,6 +26,86 @@ It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3
6
 
7
  The code to train this model is available on Google Colab and GitHub. It required an A100 GPU for about an hour.
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
  license: mit
11
  language:
 
1
+ ---
2
+ base_model: teknium/OpenHermes-2.5-Mistral-7B
3
+ tags:
4
+ - mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - gpt4
9
+ - synthetic data
10
+ - distillation
11
+ - dpo
12
+ - rlhf
13
+ license: apache-2.0
14
+ language:
15
+ - en
16
+ datasets:
17
+ - mlabonne/chatml_dpo_pairs
18
+ ---
19
+
20
+
21
+ A variation/copy of NeuralHermes 2.5 - Mistral 7B
22
 
23
  This is a variation of NeuralHermes which is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the mlabonne/chatml_dpo_pairs dataset. It surpasses the original model on most benchmarks (see results).
24
 
 
26
 
27
  The code to train this model is available on Google Colab and GitHub. It required an A100 GPU for about an hour.
28
 
29
+
30
+ I have used the following code to train the [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
31
+
32
+ Copied from NeuralHermes-2.5-Mistral-7B:
33
+
34
+ ## Quantized models
35
+
36
+ * **GGUF**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
37
+ * **AWQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
38
+ * **GPTQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
39
+ * **EXL2**:
40
+ * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
41
+ * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
42
+ * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
43
+ * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
44
+ * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
45
+
46
+
47
+ ## Usage
48
+
49
+ You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.
50
+
51
+ You can also run this model using the following code:
52
+
53
+ ```python
54
+ import transformers
55
+ from transformers import AutoTokenizer
56
+
57
+ # Format prompt
58
+ message = [
59
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
60
+ {"role": "user", "content": "What is a Large Language Model?"}
61
+ ]
62
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
63
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
64
+
65
+ # Create pipeline
66
+ pipeline = transformers.pipeline(
67
+ "text-generation",
68
+ model=new_model,
69
+ tokenizer=tokenizer
70
+ )
71
+
72
+ # Generate text
73
+ sequences = pipeline(
74
+ prompt,
75
+ do_sample=True,
76
+ temperature=0.7,
77
+ top_p=0.9,
78
+ num_return_sequences=1,
79
+ max_length=200,
80
+ )
81
+ print(sequences[0]['generated_text'])
82
+ ```
83
+
84
+ ## Training hyperparameters
85
+
86
+ **LoRA**:
87
+ * r=16
88
+ * lora_alpha=16
89
+ * lora_dropout=0.05
90
+ * bias="none"
91
+ * task_type="CAUSAL_LM"
92
+ * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
93
+
94
+ **Training arguments**:
95
+ * per_device_train_batch_size=4
96
+ * gradient_accumulation_steps=4
97
+ * gradient_checkpointing=True
98
+ * learning_rate=5e-5
99
+ * lr_scheduler_type="cosine"
100
+ * max_steps=5
101
+ * optim="paged_adamw_32bit"
102
+ * warmup_steps=100
103
+
104
+ **DPOTrainer**:
105
+ * beta=0.1
106
+ * max_prompt_length=1024
107
+ * max_length=1536
108
+ *
109
  ---
110
  license: mit
111
  language: