File size: 7,072 Bytes

---
language:
- en
- es
- pt
tags:
- falcon3
license: other 
license_name: falcon-llm-license 
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
---



#  Table of Contents

0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)


# TL;DR

# Model Details

⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** 

## Model Description

- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Architecture:** Transformer-base
- **Language(s) (NLP):** Mainly English
- **License:** TII Falcon-LLM License 2.0

<br>

# Usage

Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):

## Using the Pytorch model with 🤗 transformers

### Running the model on a CPU

<details>
<summary> Click to expand </summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU using `torch.compile`

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>


# Training Details

## Training Data

Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.

## Training Procedure

Falcon3-7B is trained on 256 H100 nodes (world size 2048).

### Training Hyperparameters

| **Hyperparameter** | **Value**  | **Comment**                           |
|--------------------|------------|---------------------------------------|
| Precision          | `bfloat16` |                                       |
| Optimizer          | AdamW      |                                       |
| Max learning rate  | 6e-4       | Following a WSD (warmup-stable-decay) |
|                    |            | learning rate scheduler               |
| Weight decay       | 1e-1       |                                       |
| z-loss             | 1e-4       |                                       |
| Batch size         | Variable   | Batch size was gradually increased    |
|                    |            | during the training                   |

# Evaluation
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
    <colgroup>
        <col style="width: 10%;">
        <col style="width: 10%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
    </colgroup>
    <thead>
        <tr>
            <th>Category</th>
            <th>Benchmark</th>
            <th>Llama-3.2-1B</th>
            <th>Qwen2.5-1.5B</th>
            <th>SmolLM2-1.7B</th>
            <th>gemma-2-2b</th>
            <th>Falcon3-1B-Base</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="3">General</td>
            <td>MMLU (5-shot)</td>
            <td>31.1</td>
            <td>61.0</td>
            <td>50.2</td>
            <td>53.1</td>
            <td>42.5</td>
        </tr>
        <tr>
            <td>MMLU-PRO (5-shot)</td>
            <td>11.7</td>
            <td>28.5</td>
            <td>21.4</td>
            <td>22.1</td>
            <td>16.2</td>
        </tr>
        <tr>
            <td>IFEval</td>
            <td>14.9</td>
            <td>26.1</td>
            <td>24.2</td>
            <td>20.4</td>
            <td>25.3</td>
        </tr>
        <tr>
            <td rowspan="2">Math</td>
            <td>GSM8K (5-shot)</td>
            <td>6.6</td>
            <td>62.3</td>
            <td>31.1</td>
            <td>25.6</td>
            <td>34.3</td>
        </tr>
        <tr>
            <td>MATH (4-shot)</td>
            <td>0.3</td>
            <td>6.8</td>
            <td>1.5</td>
            <td>2.6</td>
            <td>2.2</td>
        </tr>
        <tr>
            <td rowspan="4">Reasoning</td>
            <td>Arc Challenge (25-shot)</td>
            <td>40.2</td>
            <td>54.8</td>
            <td>54.1</td>
            <td>53.7</td>
            <td>48.2</td>
        </tr>
        <tr>
            <td>GPQA (0-shot)</td>
            <td>24.3</td>
            <td>28.2</td>
            <td>28.9</td>
            <td>25.5</td>
            <td>28.1</td>
        </tr>
        <tr>
            <td>MUSR (0-shot)</td>
            <td>34.5</td>
            <td>35.5</td>
            <td>34.8</td>
            <td>42.8</td>
            <td>41.9</td>
        </tr>
        <tr>
            <td>BBH (3-shot)</td>
            <td>31.2</td>
            <td>41.1</td>
            <td>34.3</td>
            <td>36.8</td>
            <td>36.1</td>
        </tr>
        <tr>
            <td rowspan="4">CommonSense Understanding</td>
            <td>PIQA (0-shot)</td>
            <td>74.6</td>
            <td>76.0</td>
            <td>77.5</td>
            <td>79.2</td>
            <td>74.5</td>
        </tr>
        <tr>
            <td>SciQ (0-shot)</td>
            <td>88.5</td>
            <td>93.1</td>
            <td>90.8</td>
            <td>95.7</td>
            <td>91.1</td>
        </tr>
        <tr>
            <td>Winogrande (0-shot)</td>
            <td>60.4</td>
            <td>63.0</td>
            <td>66.1</td>
            <td>68.6</td>
            <td>61.2</td>
        </tr>
        <tr>
            <td>OpenbookQA (0-shot)</td>
            <td>37.4</td>
            <td>40.4</td>
            <td>44.0</td>
            <td>41.8</td>
            <td>41.0</td>
        </tr>
    </tbody>
</table>




# Citation