|
--- |
|
language: |
|
- en |
|
tags: |
|
- falcon3 |
|
--- |
|
|
|
|
|
# Table of Contents |
|
|
|
0. [TL;DR](#TL;DR) |
|
1. [Model Details](#model-details) |
|
2. [Usage](#usage) |
|
3. [Training Details](#training-details) |
|
4. [Evaluation](#evaluation) |
|
|
|
|
|
# TL;DR |
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
- **Developed by:** [https://www.tii.ae](https://www.tii.ae) |
|
- **Model type:** Causal decoder-only |
|
- **Architecture:** Transformer-base |
|
- **Language(s) (NLP):** Mainly English |
|
- **License:** TII Falcon-Mamba License 2.0 |
|
|
|
<br> |
|
|
|
# Usage |
|
|
|
Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): |
|
|
|
## Using the Pytorch model with 🤗 transformers |
|
|
|
### Running the model on a CPU |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") |
|
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") |
|
|
|
input_text = "Question: How many hours in one day? Answer: " |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
</details> |
|
|
|
### Running the model on a GPU |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```python |
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") |
|
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") |
|
|
|
input_text = "Question: How many hours in one day? Answer: " |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
</details> |
|
|
|
### Running the model on a GPU using `torch.compile` |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") |
|
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) |
|
|
|
model = torch.compile(model) |
|
|
|
input_text = "Question: How many hours in one day? Answer: " |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") |
|
|
|
outputs = model.generate(input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
</details> |
|
|
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
## Training Procedure |
|
|
|
### Training Hyperparameters |
|
|
|
| **Hyperparameter** | **Value** | **Comment** | |
|
|--------------------|------------|-------------------------------------------| |
|
| Precision | `bfloat16` | | |
|
| Optimizer | AdamW | | |
|
| Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule | |
|
| Weight decay | | | |
|
| Batch size | | | |
|
|
|
|
|
# Evaluation |
|
|
|
|
|
|
|
# Citation |
|
|
|
|
|
|