puneeshkhanna
commited on
Commit
•
2abf5a8
1
Parent(s):
4ed3ec1
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,11 @@ tags:
|
|
16 |
|
17 |
|
18 |
# TL;DR
|
19 |
-
|
|
|
|
|
|
|
|
|
20 |
|
21 |
This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
|
22 |
|
@@ -32,6 +36,10 @@ This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B
|
|
32 |
|
33 |
<br>
|
34 |
|
|
|
|
|
|
|
|
|
35 |
# Usage
|
36 |
|
37 |
Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
|
@@ -80,96 +88,7 @@ print(response)
|
|
80 |
|
81 |
</details>
|
82 |
|
83 |
-
|
84 |
-
# Training Details
|
85 |
-
Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).
|
86 |
-
|
87 |
-
## Supervised finetuning
|
88 |
-
### Training Data
|
89 |
-
1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.
|
90 |
-
|
91 |
-
| Data type | ratio |
|
92 |
-
|--------------------------------------|-------|
|
93 |
-
| Conversations | 32% |
|
94 |
-
| STEM | 32% |
|
95 |
-
| Code | 12% |
|
96 |
-
| Safety | 9.1% |
|
97 |
-
| Multi lingual | 8.3% |
|
98 |
-
| Function call | 3.3% |
|
99 |
-
| NLP (summarization, generation, QA) | 3.2% |
|
100 |
-
|
101 |
-
#### Training Hyperparameters
|
102 |
-
|
103 |
-
<style type="text/css">
|
104 |
-
.tg {border-collapse:collapse;border-spacing:0;}
|
105 |
-
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
106 |
-
overflow:hidden;padding:10px 5px;word-break:normal;}
|
107 |
-
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
108 |
-
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
|
109 |
-
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
|
110 |
-
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
|
111 |
-
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
|
112 |
-
.tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
|
113 |
-
.tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
|
114 |
-
.tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
|
115 |
-
.tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
|
116 |
-
.tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
|
117 |
-
</style>
|
118 |
-
<table class="tg"><thead>
|
119 |
-
<tr>
|
120 |
-
<th class="tg-7btt" rowspan="3">AdamW</th>
|
121 |
-
<th class="tg-c3ow">β1</th>
|
122 |
-
<th class="tg-0pky">0.9</th>
|
123 |
-
</tr>
|
124 |
-
<tr>
|
125 |
-
<th class="tg-ihkz">β2</th>
|
126 |
-
<th class="tg-pcvp">0.999</th>
|
127 |
-
</tr>
|
128 |
-
<tr>
|
129 |
-
<th class="tg-c3ow">weight decay</th>
|
130 |
-
<th class="tg-0pky">0.01</th>
|
131 |
-
</tr></thead>
|
132 |
-
<tbody>
|
133 |
-
<tr>
|
134 |
-
<td class="tg-j2vi" rowspan="4">Learning rate</td>
|
135 |
-
<td class="tg-ihkz">type</td>
|
136 |
-
<td class="tg-pcvp">linear decay</td>
|
137 |
-
</tr>
|
138 |
-
<tr>
|
139 |
-
<td class="tg-c3ow">init lr</td>
|
140 |
-
<td class="tg-0pky">5e-6</td>
|
141 |
-
</tr>
|
142 |
-
<tr>
|
143 |
-
<td class="tg-ihkz">final lr</td>
|
144 |
-
<td class="tg-pcvp">0</td>
|
145 |
-
</tr>
|
146 |
-
<tr>
|
147 |
-
<td class="tg-c3ow">warm rate</td>
|
148 |
-
<td class="tg-0pky">0.03</td>
|
149 |
-
</tr>
|
150 |
-
<tr>
|
151 |
-
<td class="tg-j2vi">Batch size</td>
|
152 |
-
<td class="tg-ihkz"></td>
|
153 |
-
<td class="tg-pcvp">64</td>
|
154 |
-
</tr>
|
155 |
-
<tr>
|
156 |
-
<td class="tg-amwm">Epochs</td>
|
157 |
-
<td class="tg-0lax"></td>
|
158 |
-
<td class="tg-0lax">2</td>
|
159 |
-
</tr>
|
160 |
-
</tbody>
|
161 |
-
</table>
|
162 |
-
|
163 |
-
## Human preference alignment - DPO
|
164 |
-
|
165 |
-
### Training Data
|
166 |
-
TO DO DO DO DO
|
167 |
-
|
168 |
-
#### Training Hyperparameters
|
169 |
-
TODODODODOD
|
170 |
-
|
171 |
-
|
172 |
-
# Evaluation
|
173 |
We report in the following table our internal pipeline benchmarks:
|
174 |
|
175 |
|
|
|
16 |
|
17 |
|
18 |
# TL;DR
|
19 |
+
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
|
20 |
+
|
21 |
+
Achieves state of art results on reasoning, language understanding, instruction following, code and mathematics tasks.
|
22 |
+
|
23 |
+
Supports context length up to 32K.
|
24 |
|
25 |
This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
|
26 |
|
|
|
36 |
|
37 |
<br>
|
38 |
|
39 |
+
## Model Architecture
|
40 |
+
Falcon 3 uses grouped query attention (GQA) for faster inference and a wider head dimension of 256.
|
41 |
+
High ROPE value is used to support long context understanding.
|
42 |
+
|
43 |
# Usage
|
44 |
|
45 |
Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
|
|
|
88 |
|
89 |
</details>
|
90 |
|
91 |
+
# Benchmarks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
We report in the following table our internal pipeline benchmarks:
|
93 |
|
94 |
|