Alslamo alaikom my brother <3 , raw transcript repitition problem

by TheGreatQuran2026 - opened 1 day ago

TheGreatQuran2026

I tried to use this model which is more accurate Masha'a Allah
to trasnscribe Recitation audio into segmented word by word timing
but when reciter repeats a phrase , word or ayah this model 50-70% drops the second repititions
is this a deduplication feature in the onnx ?

This is the Repo but i am using there yazinsae model ...you can change the model.onnx with yours to make a test and compare the raw transcription.json <3

https://github.com/Iam-Muslim/QuranReciteToText

TheGreatQuran2026

about 15 hours ago

•

edited about 15 hours ago

while testing the streaming model it catched all repetitions masha'a Allah <3

Muno459

Owner about 11 hours ago

Wa alaykum as-salam wa rahmatullah. Good observation, and it is not a dedup feature. It is inherent to greedy CTC decoding: when the same tokens repeat back-to-back with no blank frame between them, CTC collapses them into one. A clear pause inserts a blank and the repeat survives; an immediate repeat often does not, so it gets dropped. The streaming model decodes in chunks, which separates the repeats, which is exactly why it catches them (masha'Allah). So for your word-by-word timing use case the streaming model is the right one to use. The next version I am working on should improve this further. Barakallah feek.

TheGreatQuran2026

about 8 hours ago

Thanks my brother <3 <3
your offline model accuracy of letters in a word is good masha'a Allah <3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment