ArTST-v2 Dialects โ Arabic Dialect ASR
ArTST-v2 Dialects is an automatic speech recognition model for Arabic speech-to-text transcription across multiple Arabic dialects.
The model is based on ArTST, an Arabic Text and Speech Transformer model developed by the Speech Lab at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).
This checkpoint was fine-tuned for automatic speech recognition (ASR) on multiple Arabic dialects.
For more details about the original ArTST model, please refer to the official repository:
https://github.com/mbzuai-nlp/ArTST
Model Description
This model is designed to transcribe spoken Arabic into text, with a focus on dialectal Arabic speech. It primarily supports the Arabic language and has been fine-tuned to improve recognition performance across multiple Arabic dialects.
Model Usage
import torch
import soundfile as sf
from transformers import (
SpeechT5ForSpeechToText,
SpeechT5Processor,
SpeechT5Tokenizer,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "Mohammed01/ArTST-v2-Dialects"
tokenizer = SpeechT5Tokenizer.from_pretrained(model_id)
processor = SpeechT5Processor.from_pretrained(model_id, tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained(model_id).to(device)
audio, sampling_rate = sf.read("audio.wav")
inputs = processor(
audio=audio,
sampling_rate=sampling_rate,
return_tensors="pt"
)
inputs = {key: value.to(device) for key, value in inputs.items()}
predicted_ids = model.generate(
**inputs,
max_length=250
)
transcription = processor.batch_decode(
predicted_ids,
skip_special_tokens=True
)
print(transcription[0])
- Downloads last month
- 21