Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quran syllables recognition with tashkeel.

This is fine tuned wav2vec2 model to recognize quran syllables from speech.
The model was trained on Tarteel dataset after cleanning and converting into syllables .
5-gram language model is available with the model.

The model transcripe audio speech into syllables .
For instance, when presented with the audio and transcription "ู…ูู†ูŽ ุงู„ู’ุฌูู†ูŽู‘ุฉู ูˆูŽุงู„ู†ูŽู‘ุงุณู" the expected model output would be "ู…ู ู†ูŽู„ู’ ุฌูู†ู’ ู†ูŽ ุชู ูˆูŽู†ู’ ู†ูŽุงู’ุณู’" .
To try it out :

!pip install datasets transformers
!pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from transformers import Wav2Vec2ProcessorWithLM
processor = Wav2Vec2ProcessorWithLM.from_pretrained('IbrahimSalah/Wav2vecXXl_quran_syllables')
model = Wav2Vec2ForCTC.from_pretrained("IbrahimSalah/Wav2vecXXl_quran_syllables")
import pandas as pd
dftest = pd.DataFrame(columns=['audio'])
import datasets
from datasets import Dataset
path ='/content/908-33.wav'
dftest['audio']=[path]  ## audio path
dataset = Dataset.from_pandas(dftest)
import torch
import torchaudio
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = torchaudio.load(batch["audio"])
    print(sampling_rate)
    resampler = torchaudio.transforms.Resample(sampling_rate, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
    batch["audio"] = resampler(speech_array).squeeze().numpy()
    return batch
import numpy as np
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
test_dataset = dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["audio"], sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
    logits = model(inputs.input_values).logits
    print(logits.numpy().shape)

transcription = processor.batch_decode(logits.numpy()).text
print("Prediction:",transcription[0])

sample audio 1-

2-

Output

1- ุกููˆู’ ู„ูŽุงู’ ุกู ูƒูŽ ู„ูŽู…ู’ ูŠูŽ ูƒููˆู’ ู†ููˆู’ ู…ูุนู’ ุฌู ุฒููŠู’ ู†ูŽ ููู„ู’ ุกูŽุฑู’ ุถู ูˆูŽ ู…ูŽุงู’ ูƒูŽุงู’ ู†ูŽ ู„ูŽ ฺพูู…ู’ ู…ูู†ู’ ุกูŽูˆู’ ู„ู ูŠูŽุงู’
2- ุกูุฐู’ ู‚ูŽุงู’ ู„ูŽ ูŠููˆู’ ุณู ูู ู„ู ุกูŽูŠู’ ุจููŠู’ ฺพู ูŠูŽุงู’ ุกูŽ ุจูŽ ุชู ุกูู†ู’ ู†ููŠู’ ุฑูŽ ุกูŽูŠู’ ุชู ุกูŽ ุญูŽ ุฏูŽ ุนูŽ ุดูŽ ุฑูŽ ูƒูŽูˆู’ ูƒูŽ ุจูŽู„ู’ ูˆูŽุดู’ ุดูŽู…ู’ ุณูŽ ูˆูŽู„ู’ ู‚ูŽ ู…ูŽ ุถูŽ ุฑูŽ ุกูŽูŠู’ ุชู ฺพูู…ู’ ู„ููŠู’ ุณูŽุงู’ ุฌู ุฏููŠู’ู†ู’
Downloads last month
534
Safetensors
Model size
318M params
Tensor type
F32
ยท