# Model: FalconMasr

This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic.

## Model Configuration
- **Base Model**: `ybelkada/falcon-7b-sharded-bf16`
- **Quantization**: 4-bit with `nf4` quantization type and `float16` computation
- **LoRA Configuration**: `lora_alpha=16`, `lora_dropout=0`, `r=64`
- **Task Type**: Causal Language Modeling
- **Target Modules**: `query_key_value`, `dense`, `dense_h_to_4h`, `dense_4h_to_h`

## Training
The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below:

| Step | Training Loss |
|------|---------------|
| 10   | 1.459100      |
| 20   | 1.335000      |
| 30   | 1.295600      |
| 40   | 1.177000      |
| 50   | 1.144900      |
| 60   | 1.132900      |
| 70   | 1.074500      |
| 80   | 1.078600      |
| 90   | 1.121100      |
| 100  | 0.936000      |
| 110  | 1.151500      |
| 120  | 1.068000      |
| 130  | 1.056700      |
| 140  | 0.976900      |
| 150  | 0.867300      |
| 160  | 1.151100      |
| 170  | 1.023200      |
| 180  | 1.074300      |
| 190  | 1.036800      |
| 200  | 0.930700      |
| 210  | 0.960800      |
| 220  | 1.098800      |
| 230  | 0.967400      |
| 240  | 0.961700      |
| 250  | 0.871100      |
| 260  | 0.869400      |
| 270  | 0.939500      |
| 280  | 1.087600      |
| 290  | 1.080700      |
| 300  | 0.906800      |
| 310  | 0.901600      |
| 320  | 0.943200      |
| 330  | 0.968900      |
| 340  | 0.986600      |
| 350  | 1.014200      |
| 360  | 1.191700      |
| 370  | 0.992500      |
| 380  | 0.963600      |
| 390  | 0.888800      |
| 400  | 0.746000      |

## Usage
To use this model, load it with the following configuration:

```python
import torch
from transformers import AutoModelForCausalLM,BitsAndBytesConfig
from transformers import AutoTokenizer
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# Model Configuration
model_name ="MahmoudIbrahim/FalconMasr"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
model.config.use_cache = False


tokenizer =AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
tokenizer.pad_token = tokenizer.eos_token


input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟"

# Move inputs to the same device as the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set use_reentrant=False for torch checkpointing
torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# Remove 'token_type_ids' if it's present in the inputs
inputs.pop('token_type_ids', None)

# Generate the output
output = model.generate(**inputs, max_length=200,
                        use_cache=False,pad_token_id=tokenizer.eos_token_id)

# Decode the generated output
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded_output)