haritzpuerto commited on
Commit
0355710
1 Parent(s): 8359318

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,3 +1,137 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: peft
6
+ pipeline_tag: text-generation
7
+ datasets:
8
+ - allenai/ai2_arc
9
+ - tasksource/Boardgame-QA
10
+ - skrishna/coin_flip
11
+ - openai/gsm8k
12
+ - hotpotqa/hotpot_qa
13
+ - ChilleD/LastLetterConcat
14
+ - allenai/quartz
15
+ - tasksource/strategy-qa
16
+ - ConditionalQA
17
+ ---
18
+
19
+ This is the official model from the publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models" (arXiv, 2024).
20
+
21
+ > TLDR: Divergent Chain of Thought (DCoT) consists of requiring models to generate multiple CoTs before choosing an answer and adding DCoT data to instruction tuning allows models to improve performance through self-correction.
22
+
23
+
24
+ Stay tuned for the release of the paper!
25
+
26
+
27
+ # Load the Model
28
+ ```
29
+ from peft import LoraConfig, PeftModel
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer
31
+ import torch
32
+
33
+
34
+ base_model_path = "meta-llama/Llama-2-13b-hf"
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ base_model_path,
37
+ torch_dtype=torch.bfloat16,
38
+ device_map="auto",
39
+ )
40
+ peft_model_id = "haritzpuerto/LLaMA2-13B-dcot"
41
+ model.load_adapter(peft_model_id)
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(base_model_path)
44
+ ```
45
+
46
+ # Run the model
47
+
48
+ ## Prompt Template
49
+
50
+ ```
51
+ [Question] {question} [Context] {document} [Options] {answer_options} [Number of answers] {k}
52
+ ```
53
+
54
+ Note, that not all commands (text in brackets) are mandatory. `[Context]` and `[Options]` are optional.
55
+ - `[Context]` refers to a paragraph that contains the answer to a question (for span-extraction QA).
56
+ - `[Options]` refers to a list of candidate answers (for multiple-choice QA). The format is `A) {answer option 1} B) {answer option 2}, ...`
57
+
58
+ The minimal template is
59
+
60
+ ```
61
+ [Question] {question} [Number of answers] {k}
62
+ ```
63
+
64
+ The inclusion of context and options depends on your tasks.
65
+
66
+ ## Response format
67
+ You should expect the model returning the following type of text
68
+
69
+ ```
70
+ [Answer 1]CoT_1
71
+ [Answer 2]CoT_2
72
+ ...
73
+ [Final answer] answer
74
+ ```
75
+
76
+ You should get as many answers as requested with the command `[Number of answers] {k}`
77
+
78
+ ## Run Example
79
+
80
+ ```
81
+ prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
82
+ inputs = tokenizer(prompt, return_tensors="pt")
83
+ output = model.generate(**inputs.to("cuda"), max_length=1024)
84
+ print(tokenizer.decode(output[0]))
85
+ ```
86
+
87
+ You should get an output similar to:
88
+ ```
89
+ <s> [Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?
90
+ [Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.
91
+ [Number of answers] 2
92
+ [Answer 1] 1. Juan and LaKeisha want to see which object rolls the farthest.
93
+ 2. They have already rolled a few objects down the ramp.
94
+ 3. To repeat their investigation, they need to do something that will affect the outcome of the experiment.
95
+ 4. Putting the objects in groups will not affect the outcome of the experiment.
96
+ 5. Changing the height of the ramp may affect the outcome, but it is not the best option as it requires changing the setup of the experiment.
97
+ 6. Choosing different objects to roll may also affect the outcome, but it is not the best option as it does not address the issue of repeating the experiment.
98
+ 7. The best option is to record the details of the investigation. This includes the objects used, the height of the ramp, and any other relevant information. By recording the details, Juan and LaKeisha can repeat the experiment with the same conditions and compare the results.
99
+ [Answer 2] Step 1: Identify the problem and the question.
100
+
101
+ Problem: Juan and LaKeisha want to see which object rolls the farthest.
102
+
103
+ Question: What should they do to repeat their investigation?
104
+
105
+ Step 2: Evaluate the options.
106
+
107
+ A) Put the objects in groups. - This option does not directly relate to the question of which object rolls the farthest, so it can be eliminated.
108
+
109
+ B) Change the height of the ramp. - This option also does not directly relate to the question of which object rolls the farthest, so it can be eliminated.
110
+
111
+ C) Choose different objects to roll. - This option is a possible solution to the question, but it does not guarantee that the object will roll the farthest.
112
+
113
+ D) Record the details of the investigation. - This option is a necessary step to repeat the investigation.
114
+
115
+ Step 3: Choose the best option.
116
+
117
+ The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
118
+
119
+ [Final answer] D) Record the details of the investigation.</s>
120
+ ```
121
+
122
+
123
+ # Training details
124
+ We train all models using LoRA with the PEFT library. The main parameters are:
125
+
126
+ | Param. name | Value |
127
+ |---------------------|:-------------------:|
128
+ | lora\_r | 64 |
129
+ | lora\_alpha | 16 |
130
+ | lora\_dropout | 0.1 |
131
+ | batch size | 4 |
132
+ | learning\_rate | 2e-4 |
133
+ | weight\_decay | 0.001 |
134
+ | optim | paged\_adamw\_32bit |
135
+ | lr\_scheduler\_type | constant |
136
+
137
+ Please check Appendix B of the paper for more details.