--- language: - en tags: - falcon3 --- # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Usage](#usage) 3. [Training Details](#training-details) 4. [Evaluation](#evaluation) # TL;DR # Model Details ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Architecture:** Transformer-base - **Language(s) (NLP):** Mainly English - **License:** TII Falcon-LLM License 2.0
# Usage Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): ## Using the Pytorch model with 🤗 transformers ### Running the model on a CPU
Click to expand ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
### Running the model on a GPU
Click to expand ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
### Running the model on a GPU using `torch.compile`
Click to expand ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) model = torch.compile(model) input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
# Training Details ## Training Data ## Training Procedure ### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule | | Weight decay | | | | Batch size | | | # Evaluation
Category Benchmark Llama3.1-8B Qwen2-7B Qwen2.5-7B Falcon3-7B-Base Gemma2-9B Yi1.5-9B Mistral-NeMo-12B Falcon3-10B-Base
General MMLU (5-shot) 65.2 70.4 74.2 67.5 0 69.6 68.8 73.1
MMLU-PRO (5-shot) 32.7 42.1 43.5 39.2 0 39.3 34.7 42.5
IFEval 12.0 30.6 33.9 34.3 0 29.1 16.1 36.4
Math GSM8K (5-shot) 49.4 77.9 82.9 76.2 69.1 63.8 55.3 81.4
MATH(4-shot) 4.1 17.5 15.5 18.0 0 9.2 4.9 22.9
Reasoning Arc Challenge (25-shot) 53.4 57.4 59.0 59.6 63.7 58.2 60.6 62.6
GPQA (0-shot) 31.0 31.9 33.0 35.5 0 36.6 28.8 34.1
MUSR (0-shot) 38.0 44.1 44.2 47.3 0 43.3 39.2 44.2
BBH (3-shot) 46.5 53.3 54.0 51.0 0 51.3 50.2 59.7
CommonSense Understanding PIQA (0-shot) 80.3 79.8 78.7 77.7 81.4 79.8 81.4 79.1
SciQ (0-shot) 96.3 95.9 96.6 95.3 97.2 95.8 96.4 96.0
Winogrande (0-shot) 74.0 72.1 72.9 71.0 74.2 72.7 73.2 73.6
OpenbookQA (0-shot) 33.4 35.2 33.6 31.4 34.0 35.4 36.4 34.0
# Citation