tiiuae
/

Falcon3-7B-Instruct

@@ -5,44 +5,31 @@ tags:
 - falcon3
 ---
-#  Table of Contents
-0. [TL;DR](#TL;DR)
-1. [Model Details](#model-details)
-2. [Usage](#usage)
-3. [Training Details](#training-details)
-4. [Evaluation](#evaluation)
-# TL;DR
-Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
-Achieves state of art results on reasoning, language understanding, instruction following, code and mathematics tasks.
-Supports context length up to 32K.
-This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
-# Model Details
-## Model Description
-- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
-- **Model type:** Causal decoder-only
-- **Architecture:** Transformer-base
-- **Language(s) (NLP):** Mainly English
-- **License:** TII Falcon-LLM License 2.0
-<br>
-## Model Architecture
-Falcon 3 uses grouped query attention (GQA) for faster inference and a wider head dimension of 256.
-High ROPE value is used to support long context understanding.
-# Usage
-Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
 <details>
 <summary> Click to expand </summary>
@@ -88,10 +75,11 @@ print(response)
 </details>
 # Benchmarks
 We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
         <col style="width: 10%;">
@@ -99,6 +87,7 @@ We report in the following table our internal pipeline benchmarks:
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
@@ -108,6 +97,7 @@ We report in the following table our internal pipeline benchmarks:
             <th>Llama-3.1-8B-Instruct</th>
             <th>Qwen2-7B-Instruct</th>
             <th>Qwen2.5-7B-Instruct</th>
             <th>Falcon3-7B-Instruct</th>
         </tr>
     </thead>
@@ -119,6 +109,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
@@ -126,6 +117,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>IFEval</td>
@@ -133,6 +125,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td rowspan="2">Math</td>
@@ -141,6 +134,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
@@ -148,6 +142,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td rowspan="4">Reasoning</td>
@@ -156,6 +151,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
@@ -163,6 +159,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
@@ -170,6 +167,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
@@ -177,6 +175,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
@@ -185,6 +184,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
@@ -192,6 +192,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
@@ -199,6 +200,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
@@ -206,13 +208,14 @@ We report in the following table our internal pipeline benchmarks:
             <td>-</td>
             <td>-</td>
             <td>-</td>
         </tr>
     </tbody>
 </table>
 # Citation
-If Falcon3 series were helpful to your work, feel free to give us a cite.
 ```
 @misc{Falcon3,

 - falcon3
 ---
+# Falcon3-7B-Instruct
+**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
+This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
+Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
+## Model Details
+- Architecture
+  - transformer based causal decoder only architecture
+  - 28 decoder blocks
+  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
+  - wider head dimension: 256
+  - high RoPE value to support long context understanding: 1000042
+  - 32k context length
+  - 131k vocab size
+- Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
+- Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
+- Supports EN, FR, ES, PT
+- Developed by [Technology Innovation Institute](https://www.tii.ae)
+- License: TII Falcon-LLM License 2.0
+- Model Release Date: December 2024
+## Getting started
 <details>
 <summary> Click to expand </summary>
 </details>
+<br>
 # Benchmarks
 We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
         <col style="width: 10%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
+        <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
             <th>Llama-3.1-8B-Instruct</th>
             <th>Qwen2-7B-Instruct</th>
             <th>Qwen2.5-7B-Instruct</th>
+            <th>gemma-2-9b-it</th>
             <th>Falcon3-7B-Instruct</th>
         </tr>
     </thead>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>IFEval</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td rowspan="2">Math</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td rowspan="4">Reasoning</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
             <td>-</td>
             <td>-</td>
             <td>-</td>
+            <td>-</td>
         </tr>
     </tbody>
 </table>
 # Citation
+If Falcon3 family were helpful to your work, feel free to give us a cite.
 ```
 @misc{Falcon3,