Alslamo alaikom my brother <3 , raw transcript repitition problem

#5
by TheGreatQuran2026 - opened

I tried to use this model which is more accurate Masha'a Allah
to trasnscribe Recitation audio into segmented word by word timing
but when reciter repeats a phrase , word or ayah this model 50-70% drops the second repititions
is this a deduplication feature in the onnx ?

This is the Repo but i am using there yazinsae model ...you can change the model.onnx with yours to make a test and compare the raw transcription.json <3

https://github.com/Iam-Muslim/QuranReciteToText

while testing the streaming model it catched all repetitions masha'a Allah <3

Wa alaykum as-salam wa rahmatullah. Good observation, and it is not a dedup feature. It is inherent to greedy CTC decoding: when the same tokens repeat back-to-back with no blank frame between them, CTC collapses them into one. A clear pause inserts a blank and the repeat survives; an immediate repeat often does not, so it gets dropped. The streaming model decodes in chunks, which separates the repeats, which is exactly why it catches them (masha'Allah). So for your word-by-word timing use case the streaming model is the right one to use. The next version I am working on should improve this further. Barakallah feek.

Thanks my brother <3 <3
your offline model accuracy of letters in a word is good masha'a Allah <3

Sign up or log in to comment