metadata
language:
- en
tags:
- falcon3
Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only
- Architecture: Transformer-base
- Language(s) (NLP): Mainly English
- License: TII Falcon-Mamba License 2.0
Usage
Find below some example scripts on how to use the model in transformers
(Make sure to have the latest transformers, or the one built from source):
Using the Pytorch model with 🤗 transformers
Running the model on a CPU
Click to expand
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Running the model on a GPU
Click to expand
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Running the model on a GPU using torch.compile
Click to expand
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)
model = torch.compile(model)
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
Training Procedure
Training Hyperparameters
Hyperparameter | Value | Comment |
---|---|---|
Precision | bfloat16 |
|
Optimizer | AdamW | |
Max learning rate | Following a WSD (warmup-stable-decay) learning rate schedule | |
Weight decay | ||
Batch size |