--- language: - en tags: - falcon3 --- # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Usage](#usage) 3. [Training Details](#training-details) 4. [Evaluation](#evaluation) # TL;DR # Model Details ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Architecture:** Transformer-base - **Language(s) (NLP):** Mainly English - **License:** TII Falcon-LLM License 2.0
# Usage Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): ## Using the Pytorch model with 🤗 transformers ### Running the model on a CPU

Click to expand

```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU

Click to expand

```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU using `torch.compile`

Click to expand

```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) model = torch.compile(model) input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

# Training Details ## Training Data ## Training Procedure ### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule | | Weight decay | | | | Batch size | | | # Evaluation

Metrics	Llama3.1-8B	Falcon3-7B-Base
MUSR	Row 1, Cell 2	18.70
BBH	Row 2, Cell 2	32.68
MMLU_PRO	Row 2, Cell 2	32.43
IF_EVAL	Row 2, Cell 2	34.27
GPQA	Row 2, Cell 2	13.97
MATH	Row 2, Cell 2	18.02
AVG	Row 2, Cell 2	24.85

Category	Benchmark	Llama3.1-8B	Qwen2-7B	Qwen2.5-7B	falcon{7}{Base}	Gemma2-9B	Yi1.5-9B	Mistral-NeMo-12B	falcon{10}{Base}
General	MMLU (5-shot)	65.2	70.4	74.2	67.5	0	69.6	68.8	73.1
	MMLU-PRO (5-shot)	32.7	42.1	43.5	39.2	0	39.3	34.7	42.5
	IFEval	12.0	30.6	33.9	34.3	0	29.1	16.1	36.4
Math	GSM8K (5-shot)	49.4	77.9	82.9	76.2	69.1	63.8	55.3	81.4
Math	MATH(4-shot)	4.1	17.5	15.5	18.0	0	9.2	4.9	22.9
Reasoning	Arc Challenge (25-shot)	53.4	57.4	59.0	59.6	63.7	58.2	60.6	62.6
	GPQA (0-shot)	31.0	31.9	33.0	35.5	0	36.6	28.8	34.1
	MUSR (0-shot)	38.0	44.1	44.2	47.3	0	43.3	39.2	44.2
	BBH (3-shot)	46.5	53.3	54.0	51.0	0	51.3	50.2	59.7
CommonSense Understanding	PIQA (0-shot)	80.3	79.8	78.7	77.7	81.4	79.8	81.4	79.1
	SciQ (0-shot)	96.3	95.9	96.6	95.3	97.2	95.8	96.4	96.0
	Winogrande (0-shot)	74.0	72.1	72.9	71.0	74.2	72.7	73.2	73.6
	OpenbookQA (0-shot)	33.4	35.2	33.6	31.4	34.0	35.4	36.4	34.0

# Citation