# Model: FalconMasr This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic. ## Model Configuration - **Base Model**: `ybelkada/falcon-7b-sharded-bf16` - **Quantization**: 4-bit with `nf4` quantization type and `float16` computation - **LoRA Configuration**: `lora_alpha=16`, `lora_dropout=0`, `r=64` - **Task Type**: Causal Language Modeling - **Target Modules**: `query_key_value`, `dense`, `dense_h_to_4h`, `dense_4h_to_h` ## Training The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below: | Step | Training Loss | |------|---------------| | 10 | 1.459100 | | 20 | 1.335000 | | 30 | 1.295600 | | 40 | 1.177000 | | 50 | 1.144900 | | 60 | 1.132900 | | 70 | 1.074500 | | 80 | 1.078600 | | 90 | 1.121100 | | 100 | 0.936000 | | 110 | 1.151500 | | 120 | 1.068000 | | 130 | 1.056700 | | 140 | 0.976900 | | 150 | 0.867300 | | 160 | 1.151100 | | 170 | 1.023200 | | 180 | 1.074300 | | 190 | 1.036800 | | 200 | 0.930700 | | 210 | 0.960800 | | 220 | 1.098800 | | 230 | 0.967400 | | 240 | 0.961700 | | 250 | 0.871100 | | 260 | 0.869400 | | 270 | 0.939500 | | 280 | 1.087600 | | 290 | 1.080700 | | 300 | 0.906800 | | 310 | 0.901600 | | 320 | 0.943200 | | 330 | 0.968900 | | 340 | 0.986600 | | 350 | 1.014200 | | 360 | 1.191700 | | 370 | 0.992500 | | 380 | 0.963600 | | 390 | 0.888800 | | 400 | 0.746000 | ## Usage To use this model, load it with the following configuration: ```python import torch from transformers import AutoModelForCausalLM,BitsAndBytesConfig from transformers import AutoTokenizer import warnings warnings.filterwarnings("ignore", category=FutureWarning) # Model Configuration model_name ="MahmoudIbrahim/FalconMasr" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, trust_remote_code=True, low_cpu_mem_usage=True, ) model.config.use_cache = False tokenizer =AutoTokenizer.from_pretrained( model_name, trust_remote_code=True, ) tokenizer.pad_token = tokenizer.eos_token input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟" # Move inputs to the same device as the model device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Set use_reentrant=False for torch checkpointing torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False # Tokenize the input text inputs = tokenizer(input_text, return_tensors="pt").to(device) # Remove 'token_type_ids' if it's present in the inputs inputs.pop('token_type_ids', None) # Generate the output output = model.generate(**inputs, max_length=200, use_cache=False,pad_token_id=tokenizer.eos_token_id) # Decode the generated output decoded_output = tokenizer.decode(output[0], skip_special_tokens=True) print(decoded_output)