slimfrikha-tii commited on
Commit
a7daa48
1 Parent(s): 2abf5a8

docs(readme): update template

Browse files
Files changed (1) hide show
  1. README.md +38 -35
README.md CHANGED
@@ -5,44 +5,31 @@ tags:
5
  - falcon3
6
  ---
7
 
 
8
 
9
- # Table of Contents
10
 
11
- 0. [TL;DR](#TL;DR)
12
- 1. [Model Details](#model-details)
13
- 2. [Usage](#usage)
14
- 3. [Training Details](#training-details)
15
- 4. [Evaluation](#evaluation)
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- # TL;DR
19
- Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
20
 
21
- Achieves state of art results on reasoning, language understanding, instruction following, code and mathematics tasks.
22
-
23
- Supports context length up to 32K.
24
-
25
- This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
26
-
27
- # Model Details
28
-
29
- ## Model Description
30
-
31
- - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
32
- - **Model type:** Causal decoder-only
33
- - **Architecture:** Transformer-base
34
- - **Language(s) (NLP):** Mainly English
35
- - **License:** TII Falcon-LLM License 2.0
36
-
37
- <br>
38
-
39
- ## Model Architecture
40
- Falcon 3 uses grouped query attention (GQA) for faster inference and a wider head dimension of 256.
41
- High ROPE value is used to support long context understanding.
42
-
43
- # Usage
44
-
45
- Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
46
 
47
  <details>
48
  <summary> Click to expand </summary>
@@ -88,10 +75,11 @@ print(response)
88
 
89
  </details>
90
 
 
 
91
  # Benchmarks
92
  We report in the following table our internal pipeline benchmarks:
93
 
94
-
95
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
96
  <colgroup>
97
  <col style="width: 10%;">
@@ -99,6 +87,7 @@ We report in the following table our internal pipeline benchmarks:
99
  <col style="width: 7%;">
100
  <col style="width: 7%;">
101
  <col style="width: 7%;">
 
102
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
103
  </colgroup>
104
  <thead>
@@ -108,6 +97,7 @@ We report in the following table our internal pipeline benchmarks:
108
  <th>Llama-3.1-8B-Instruct</th>
109
  <th>Qwen2-7B-Instruct</th>
110
  <th>Qwen2.5-7B-Instruct</th>
 
111
  <th>Falcon3-7B-Instruct</th>
112
  </tr>
113
  </thead>
@@ -119,6 +109,7 @@ We report in the following table our internal pipeline benchmarks:
119
  <td>-</td>
120
  <td>-</td>
121
  <td>-</td>
 
122
  </tr>
123
  <tr>
124
  <td>MMLU-PRO (5-shot)</td>
@@ -126,6 +117,7 @@ We report in the following table our internal pipeline benchmarks:
126
  <td>-</td>
127
  <td>-</td>
128
  <td>-</td>
 
129
  </tr>
130
  <tr>
131
  <td>IFEval</td>
@@ -133,6 +125,7 @@ We report in the following table our internal pipeline benchmarks:
133
  <td>-</td>
134
  <td>-</td>
135
  <td>-</td>
 
136
  </tr>
137
  <tr>
138
  <td rowspan="2">Math</td>
@@ -141,6 +134,7 @@ We report in the following table our internal pipeline benchmarks:
141
  <td>-</td>
142
  <td>-</td>
143
  <td>-</td>
 
144
  </tr>
145
  <tr>
146
  <td>MATH(4-shot)</td>
@@ -148,6 +142,7 @@ We report in the following table our internal pipeline benchmarks:
148
  <td>-</td>
149
  <td>-</td>
150
  <td>-</td>
 
151
  </tr>
152
  <tr>
153
  <td rowspan="4">Reasoning</td>
@@ -156,6 +151,7 @@ We report in the following table our internal pipeline benchmarks:
156
  <td>-</td>
157
  <td>-</td>
158
  <td>-</td>
 
159
  </tr>
160
  <tr>
161
  <td>GPQA (0-shot)</td>
@@ -163,6 +159,7 @@ We report in the following table our internal pipeline benchmarks:
163
  <td>-</td>
164
  <td>-</td>
165
  <td>-</td>
 
166
  </tr>
167
  <tr>
168
  <td>MUSR (0-shot)</td>
@@ -170,6 +167,7 @@ We report in the following table our internal pipeline benchmarks:
170
  <td>-</td>
171
  <td>-</td>
172
  <td>-</td>
 
173
  </tr>
174
  <tr>
175
  <td>BBH (3-shot)</td>
@@ -177,6 +175,7 @@ We report in the following table our internal pipeline benchmarks:
177
  <td>-</td>
178
  <td>-</td>
179
  <td>-</td>
 
180
  </tr>
181
  <tr>
182
  <td rowspan="4">CommonSense Understanding</td>
@@ -185,6 +184,7 @@ We report in the following table our internal pipeline benchmarks:
185
  <td>-</td>
186
  <td>-</td>
187
  <td>-</td>
 
188
  </tr>
189
  <tr>
190
  <td>SciQ (0-shot)</td>
@@ -192,6 +192,7 @@ We report in the following table our internal pipeline benchmarks:
192
  <td>-</td>
193
  <td>-</td>
194
  <td>-</td>
 
195
  </tr>
196
  <tr>
197
  <td>Winogrande (0-shot)</td>
@@ -199,6 +200,7 @@ We report in the following table our internal pipeline benchmarks:
199
  <td>-</td>
200
  <td>-</td>
201
  <td>-</td>
 
202
  </tr>
203
  <tr>
204
  <td>OpenbookQA (0-shot)</td>
@@ -206,13 +208,14 @@ We report in the following table our internal pipeline benchmarks:
206
  <td>-</td>
207
  <td>-</td>
208
  <td>-</td>
 
209
  </tr>
210
  </tbody>
211
  </table>
212
 
213
 
214
  # Citation
215
- If Falcon3 series were helpful to your work, feel free to give us a cite.
216
 
217
  ```
218
  @misc{Falcon3,
 
5
  - falcon3
6
  ---
7
 
8
+ # Falcon3-7B-Instruct
9
 
10
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
11
 
12
+ This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
13
+ Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
 
 
 
14
 
15
+ ## Model Details
16
+ - Architecture
17
+ - transformer based causal decoder only architecture
18
+ - 28 decoder blocks
19
+ - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
20
+ - wider head dimension: 256
21
+ - high RoPE value to support long context understanding: 1000042
22
+ - 32k context length
23
+ - 131k vocab size
24
+ - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
25
+ - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
26
+ - Supports EN, FR, ES, PT
27
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
28
+ - License: TII Falcon-LLM License 2.0
29
+ - Model Release Date: December 2024
30
 
 
 
31
 
32
+ ## Getting started
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  <details>
35
  <summary> Click to expand </summary>
 
75
 
76
  </details>
77
 
78
+ <br>
79
+
80
  # Benchmarks
81
  We report in the following table our internal pipeline benchmarks:
82
 
 
83
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
84
  <colgroup>
85
  <col style="width: 10%;">
 
87
  <col style="width: 7%;">
88
  <col style="width: 7%;">
89
  <col style="width: 7%;">
90
+ <col style="width: 7%;">
91
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
92
  </colgroup>
93
  <thead>
 
97
  <th>Llama-3.1-8B-Instruct</th>
98
  <th>Qwen2-7B-Instruct</th>
99
  <th>Qwen2.5-7B-Instruct</th>
100
+ <th>gemma-2-9b-it</th>
101
  <th>Falcon3-7B-Instruct</th>
102
  </tr>
103
  </thead>
 
109
  <td>-</td>
110
  <td>-</td>
111
  <td>-</td>
112
+ <td>-</td>
113
  </tr>
114
  <tr>
115
  <td>MMLU-PRO (5-shot)</td>
 
117
  <td>-</td>
118
  <td>-</td>
119
  <td>-</td>
120
+ <td>-</td>
121
  </tr>
122
  <tr>
123
  <td>IFEval</td>
 
125
  <td>-</td>
126
  <td>-</td>
127
  <td>-</td>
128
+ <td>-</td>
129
  </tr>
130
  <tr>
131
  <td rowspan="2">Math</td>
 
134
  <td>-</td>
135
  <td>-</td>
136
  <td>-</td>
137
+ <td>-</td>
138
  </tr>
139
  <tr>
140
  <td>MATH(4-shot)</td>
 
142
  <td>-</td>
143
  <td>-</td>
144
  <td>-</td>
145
+ <td>-</td>
146
  </tr>
147
  <tr>
148
  <td rowspan="4">Reasoning</td>
 
151
  <td>-</td>
152
  <td>-</td>
153
  <td>-</td>
154
+ <td>-</td>
155
  </tr>
156
  <tr>
157
  <td>GPQA (0-shot)</td>
 
159
  <td>-</td>
160
  <td>-</td>
161
  <td>-</td>
162
+ <td>-</td>
163
  </tr>
164
  <tr>
165
  <td>MUSR (0-shot)</td>
 
167
  <td>-</td>
168
  <td>-</td>
169
  <td>-</td>
170
+ <td>-</td>
171
  </tr>
172
  <tr>
173
  <td>BBH (3-shot)</td>
 
175
  <td>-</td>
176
  <td>-</td>
177
  <td>-</td>
178
+ <td>-</td>
179
  </tr>
180
  <tr>
181
  <td rowspan="4">CommonSense Understanding</td>
 
184
  <td>-</td>
185
  <td>-</td>
186
  <td>-</td>
187
+ <td>-</td>
188
  </tr>
189
  <tr>
190
  <td>SciQ (0-shot)</td>
 
192
  <td>-</td>
193
  <td>-</td>
194
  <td>-</td>
195
+ <td>-</td>
196
  </tr>
197
  <tr>
198
  <td>Winogrande (0-shot)</td>
 
200
  <td>-</td>
201
  <td>-</td>
202
  <td>-</td>
203
+ <td>-</td>
204
  </tr>
205
  <tr>
206
  <td>OpenbookQA (0-shot)</td>
 
208
  <td>-</td>
209
  <td>-</td>
210
  <td>-</td>
211
+ <td>-</td>
212
  </tr>
213
  </tbody>
214
  </table>
215
 
216
 
217
  # Citation
218
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
219
 
220
  ```
221
  @misc{Falcon3,