puneeshkhanna commited on
Commit
2abf5a8
1 Parent(s): 4ed3ec1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -91
README.md CHANGED
@@ -16,7 +16,11 @@ tags:
16
 
17
 
18
  # TL;DR
19
- Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
 
 
 
 
20
 
21
  This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
22
 
@@ -32,6 +36,10 @@ This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B
32
 
33
  <br>
34
 
 
 
 
 
35
  # Usage
36
 
37
  Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
@@ -80,96 +88,7 @@ print(response)
80
 
81
  </details>
82
 
83
-
84
- # Training Details
85
- Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).
86
-
87
- ## Supervised finetuning
88
- ### Training Data
89
- 1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.
90
-
91
- | Data type | ratio |
92
- |--------------------------------------|-------|
93
- | Conversations | 32% |
94
- | STEM | 32% |
95
- | Code | 12% |
96
- | Safety | 9.1% |
97
- | Multi lingual | 8.3% |
98
- | Function call | 3.3% |
99
- | NLP (summarization, generation, QA) | 3.2% |
100
-
101
- #### Training Hyperparameters
102
-
103
- <style type="text/css">
104
- .tg {border-collapse:collapse;border-spacing:0;}
105
- .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
106
- overflow:hidden;padding:10px 5px;word-break:normal;}
107
- .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
108
- font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
109
- .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
110
- .tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
111
- .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
112
- .tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
113
- .tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
114
- .tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
115
- .tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
116
- .tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
117
- </style>
118
- <table class="tg"><thead>
119
- <tr>
120
- <th class="tg-7btt" rowspan="3">AdamW</th>
121
- <th class="tg-c3ow">β1</th>
122
- <th class="tg-0pky">0.9</th>
123
- </tr>
124
- <tr>
125
- <th class="tg-ihkz">β2</th>
126
- <th class="tg-pcvp">0.999</th>
127
- </tr>
128
- <tr>
129
- <th class="tg-c3ow">weight decay</th>
130
- <th class="tg-0pky">0.01</th>
131
- </tr></thead>
132
- <tbody>
133
- <tr>
134
- <td class="tg-j2vi" rowspan="4">Learning rate</td>
135
- <td class="tg-ihkz">type</td>
136
- <td class="tg-pcvp">linear decay</td>
137
- </tr>
138
- <tr>
139
- <td class="tg-c3ow">init lr</td>
140
- <td class="tg-0pky">5e-6</td>
141
- </tr>
142
- <tr>
143
- <td class="tg-ihkz">final lr</td>
144
- <td class="tg-pcvp">0</td>
145
- </tr>
146
- <tr>
147
- <td class="tg-c3ow">warm rate</td>
148
- <td class="tg-0pky">0.03</td>
149
- </tr>
150
- <tr>
151
- <td class="tg-j2vi">Batch size</td>
152
- <td class="tg-ihkz"></td>
153
- <td class="tg-pcvp">64</td>
154
- </tr>
155
- <tr>
156
- <td class="tg-amwm">Epochs</td>
157
- <td class="tg-0lax"></td>
158
- <td class="tg-0lax">2</td>
159
- </tr>
160
- </tbody>
161
- </table>
162
-
163
- ## Human preference alignment - DPO
164
-
165
- ### Training Data
166
- TO DO DO DO DO
167
-
168
- #### Training Hyperparameters
169
- TODODODODOD
170
-
171
-
172
- # Evaluation
173
  We report in the following table our internal pipeline benchmarks:
174
 
175
 
 
16
 
17
 
18
  # TL;DR
19
+ Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
20
+
21
+ Achieves state of art results on reasoning, language understanding, instruction following, code and mathematics tasks.
22
+
23
+ Supports context length up to 32K.
24
 
25
  This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
26
 
 
36
 
37
  <br>
38
 
39
+ ## Model Architecture
40
+ Falcon 3 uses grouped query attention (GQA) for faster inference and a wider head dimension of 256.
41
+ High ROPE value is used to support long context understanding.
42
+
43
  # Usage
44
 
45
  Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
 
88
 
89
  </details>
90
 
91
+ # Benchmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  We report in the following table our internal pipeline benchmarks:
93
 
94