Model Card for RawandLaouini/ArabicEchoV2
Model Details
Model Description
This is a fine-tuned version of the openai/whisper-medium
model, adapted for Arabic Automatic Speech Recognition (ASR) using LoRA (Low-Rank Adaptation). The model was trained on the custom Whisper_Arabic_Merged_v6
dataset, containing 1183 audio samples, to improve transcription accuracy for Arabic speech.
- Developed by: Rawand Laouini
- Finetuned from model:
openai/whisper-medium
- Model type: Transformer-based ASR model with LoRA
- Language(s): Arabic
- License: MIT (or specify your preferred license)
- Shared by: Rawand Laouini
Uses
Direct Use
This model can be used for transcribing Arabic speech to text, ideal for applications like voice assistants, subtitle generation, or educational tools tailored to Arabic speakers.
Out-of-Scope Use
The model should not be used for real-time transcription without optimization, nor for languages other than Arabic without retraining.
Bias, Risks, and Limitations
Trained on the Whisper_Arabic_Merged_v6
dataset, the model may reflect biases or limitations in dialectal coverage or audio quality. Performance may vary with different Arabic dialects or noisy conditions. Users should validate outputs for critical use.
Recommendations
Test the model on your specific use case and consider expanding the dataset for better dialectal or noise robustness.
How to Get Started with the Model
Use the following code to load and use the model:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
model = PeftModel.from_pretrained(model, "RawandLaouini/ArabicEchoV2")
model.eval()
input_features = processor(audio, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Training Details
Training Data
- Dataset:
RawandLaouini/Whisper_Arabic_Merged_v6
(1183 samples) - Training split: 946 examples
- Validation split: Manual evaluation on 50 examples
Training Procedure
Preprocessing
Audio was processed to match Whisper's requirements, with input features extracted using the Whisper processor.
Training Hyperparameters
- Batch size: 1 (per device)
- Gradient accumulation steps: 1
- Learning rate: 1e-4
- Warmup steps: 100
- Max steps: 300
- Optimizer: AdamW
- Mixed precision: FP16
Speeds, Sizes, Times
- Training time: ~2.43 minutes for 300 steps
- Model size: Lightweight (LoRA adapters)
Evaluation
Testing Data
Manual evaluation on 50 examples from the validation split.
Metrics
- Word Error Rate (WER): 0.2969
- Character Error Rate (CER): 0.0700
Results
The model achieves a WER of 29.69% and CER of 7.00% on the manual evaluation set, indicating good transcription accuracy for Arabic speech.
Environmental Impact
- Hardware Type: NVIDIA GPU (14.74 GiB)
- Hours used: ~0.04 hours (2.43 minutes)
- Cloud Provider: Local/Colab (unspecified)
- Compute Region: Unspecified
- Carbon Emitted: Minimal (estimated < 0.01 kg CO2e using Lacoste et al., 2019)
Citation
BibTeX:
@misc{laouini2025arabicechov2,
author = {Rawand Laouini},
title = {ArabicEchoV2: Fine-tuned Whisper-medium for Arabic ASR with LoRA},
year = {2025},
howpublished = {\url{https://huggingface.co/RawandLaouini/ArabicEchoV2}}
}
APA:
Laouini, R. (2025). ArabicEchoV2: Fine-tuned Whisper-medium for Arabic ASR with LoRA. Retrieved from https://huggingface.co/RawandLaouini/ArabicEchoV2
Model Card Authors
- Rawand Laouini
Model Card Contact
- Email: [awini.rawand21@gmail.com]