jondurbin commited on
Commit
b1728be
1 Parent(s): e006e8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md CHANGED
@@ -1,3 +1,131 @@
1
  ---
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  ---
4
+
5
+ ## airoboros-gpt-3.5-turbo-100k-7b
6
+
7
+ This is a 7b parameter, fine-tuned on 100k synthetic instruction/response pairs generated by gpt-3.5-turbo using (airoboros)[https://github.com/jondurbin/airoboros]
8
+
9
+ Links:
10
+
11
+ * (airoboros)[https://github.com/jondurbin/airoboros]
12
+ * (instructions.jsonl)[https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/instructions.jsonl]
13
+ * (topics.txt)[https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/topics-d732f92dd90a1a5337a4a02ddeaec72b.txt]
14
+
15
+ ### Prompt generation
16
+
17
+ ```
18
+ airoboros generate-instructions --instruction-count 100000 --concurrency 100 --temperature 1.0
19
+ ```
20
+
21
+ ### Fine-tuning
22
+
23
+ The instructions.jsonl file was converted to conversation style expected by the FastChat training scripts, and then trained with:
24
+ ```
25
+ torchrun --nproc_per_node=8 --master_port=20001 train_mem.py \
26
+ --model_name_or_path /workspace/llama-7b-hf \
27
+ --data_path ./as_conversations.json \
28
+ --bf16 True \
29
+ --output_dir /workspace/airoboros-gpt-3.5-100k-7b \
30
+ --num_train_epochs 3 \
31
+ --per_device_train_batch_size 4 \
32
+ --per_device_eval_batch_size 32 \
33
+ --gradient_accumulation_steps 4 \
34
+ --evaluation_strategy "steps" \
35
+ --eval_steps 1500 \
36
+ --save_strategy "steps" \
37
+ --save_steps 1500 \
38
+ --save_total_limit 8 \
39
+ --learning_rate 2e-5 \
40
+ --weight_decay 0. \
41
+ --warmup_ratio 0.04 \
42
+ --lr_scheduler_type "cosine" \
43
+ --logging_steps 1 \
44
+ --fsdp "full_shard auto_wrap offload" \
45
+ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
46
+ --tf32 True \
47
+ --model_max_length 2048 \
48
+ --gradient_checkpointing True \
49
+ --lazy_preprocess True
50
+ ```
51
+
52
+ Training took roughly 22 hours on 8x nvidia A100 80GB.
53
+
54
+ Conversion to conversation style:
55
+ ```
56
+ import json
57
+ import uuid
58
+ inputs = [json.loads(line) for line in open("instructions.jsonl").readlines()]
59
+ conversations = []
60
+ for row in inputs:
61
+ inputs = row['instruction']
62
+ conversations.append({
63
+ "id": str(uuid.uuid4()),
64
+ "conversations": [
65
+ {
66
+ "from": "human",
67
+ "value": inputs,
68
+ },
69
+ {
70
+ "from": "gpt",
71
+ "value": row['response']
72
+ },
73
+ ],
74
+ })
75
+ with open("as_conversations.json", "w") as outfile:
76
+ outfile.write(json.dumps(conversations, indent=2)
77
+ ```
78
+
79
+ ## Evaluation
80
+
81
+ I used the same questions from (WizardVicunaLM)[]:
82
+
83
+ | instruction | gpt3.5 | wizard-vicuna-13b | vicuna-13b | wizard-7b | airoboros-gpt-3.5-turbo-100k-7b |
84
+ | --- | --- | --- | --- | --- | --- |
85
+ | "Write a compelling product launch announcement email to inform our customers of our new software solution." | 95 | 92 | 89 | 90 | 91 |
86
+ | "Draft an apology email to a customer who experienced a delay in their order, and provide reassurance that the issue has been resolved." | 94 | 96 | 90 | 89 | 91 |
87
+ | "As a pirate captain, what would you say to your crew to motivate them to search for hidden treasure?" | 95 | 90 | 80 | 70 | 85 |
88
+ | "Imagine you are a time traveler from the year 3000. What technological advancements would you tell people about?" | 95 | 92 | 90 | 88 | 85 |
89
+ | "As a space colonist on Mars, describe your daily life and the challenges you face living on another planet." | 95 | 90 | 87 | 85 | 88 |
90
+ | "How can you assess the credibility of a source of information, such as a news article or blog post, without relying solely on the reputation of the author or publisher?" | 93 | 85 | 89 | 87 | 90 |
91
+ | "How can observing the behavior of other people in a social situation provide clues about cultural norms and expectations?" | 95 | 90 | 85 | 92 | 80 |
92
+ | "How many text messages are sent globally in a minute? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 70 | 65 | 80 | 85 |
93
+ | "What are the main differences between Python and JavaScript programming languages?"| 90 | 85 | 80 | 88 | 82 |
94
+ | "What are the differences between plant-based and animal-based protein sources?"| 85 | 92 | 90 | 80 | 94 |
95
+ | "Describe a scenario where artificial intelligence could be used to improve the quality and efficiency of healthcare delivery." | 95 | 90 | 92 | 89 | 91 |
96
+ | "How do cultural, social, and economic factors influence people's food choices, and how can this knowledge be used to promote healthier diets?" | 90 | 85 | 87 | 83 | 84 |
97
+ | "How many words are spoken daily on Earth? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 70 | 80 | 75 | 65 |
98
+ | "How many lightning strikes occur on Earth each day? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 80 | 60 | 70 | 85 |
99
+
100
+ If we use gpt-3.5 as the baseline (as wizardvicuna/vicuna did), we get the following scores:
101
+
102
+ | gpt3.5 | wizard-vicuna-13b | vicuna-13b | wizard-7b | airoboros-gpt-3.5-turbo-100k-7b |
103
+ | --- | --- | --- | --- | --- |
104
+ | 1.0 | __0.968421052631579__ | 0.9368421052631579 | 0.9473684210526315 | 0.9578947368421052 |
105
+ | 1.0 | __1.0212765957446808__ | 0.9574468085106383 | 0.9468085106382979 | 0.9680851063829787 |
106
+ | 1.0 | __0.9473684210526315__ | 0.8421052631578947 | 0.7368421052631579 | 0.8947368421052632 |
107
+ | 1.0 | __0.968421052631579__ | 0.9473684210526315 | 0.9263157894736842 | 0.8947368421052632 |
108
+ | 1.0 | __0.9473684210526315__ | 0.9157894736842105 | 0.8947368421052632 | 0.9263157894736842 |
109
+ | 1.0 | 0.9139784946236559 | 0.956989247311828 | 0.9354838709677419 | __0.967741935483871__ |
110
+ | 1.0 | 0.9473684210526315 | 0.8947368421052632 | __0.968421052631579__ | 0.8421052631578947 |
111
+ | 1.0 | 0.7777777777777778 | 0.7222222222222222 | 0.8888888888888888 | __0.9444444444444444__ |
112
+ | 1.0 | 0.9444444444444444 | 0.8888888888888888 | __0.9777777777777777__ | 0.9111111111111111 |
113
+ | 1.0 | 1.0823529411764705 | 1.0588235294117647 | 0.9411764705882353 | __1.1058823529411765__ |
114
+ | 1.0 | 0.9473684210526315 | __0.968421052631579__ | 0.9368421052631579 | 0.9578947368421052 |
115
+ | 1.0 | 0.9444444444444444 | __0.9666666666666667__ | 0.9222222222222223 | 0.9333333333333333 |
116
+ | 1.0 | 0.7777777777777778 | __0.8888888888888888__ | 0.8333333333333334 | 0.7222222222222222 |
117
+ | 1.0 | 0.8888888888888888 | 0.6666666666666666 | 0.7777777777777778 | __0.9444444444444444__ |
118
+
119
+ Average scores:
120
+
121
+ ```
122
+ gpt3.5 1.000000
123
+ wizard-vicuna-13b 0.934090
124
+ vicuna-13b 0.900847
125
+ wizard-7b 0.902428
126
+ airoboros-gpt-3.5-turbo-100k-7b 0.926496
127
+ ```
128
+ As you can see, the __7b__ airoboros model performs well, even compared to 13b models.
129
+
130
+ ## License
131
+ The model is licensed under the LLaMA model, and the dataset is licensed under the terms of OpenAI because it uses ChatGPT. Everything else is free.