Instructions to use Muno459/fastconformer-quran-streaming with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use Muno459/fastconformer-quran-streaming with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("Muno459/fastconformer-quran-streaming") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Alslamo alaikom my brother <3 , you can export a mixed q4 with q8 model
Masha'a Allah
you can reference to this repo https://github.com/yazinsai/cyberistic-offline-tarteel
where it exports the fp32 model into a mixed model 80mb which has the same features and accuracy of the q8 model ! elhamdle Allah this could be faster on android and use less size
sorry for alot of requests <3
elhamdle Allah the greatest <3
Allah subhanu my god guided me to export the onnx here to a 80 mb model and baked the cmvn tlog with this script on google colab and the raw output was 100% similar to the 128 q8 model masha'a ALLAH <3 ELHAMDLE ALLAH THE GREATEST
but for noisy it is not good so i removed the script ....
but 128 q8 worked with the baked better elhamdle Allah
Wa alaykum as-salam, jazak Allah khayr for the pointer. A mixed-precision export (keeping the sensitive layers higher precision and quantizing the rest) is a solid way to get q8-level accuracy at a smaller size, and your own test confirms it: clean matched q8, noisy degraded. For now q8 stays the safe default because of that noisy-audio drop, but I will look at a mixed q4/q8 export for the next release. Barakallah feek for testing so carefully.
Thanks for Kind Words my brother <3
a small question can you export the streaming model to accept lower chunks size ? the 112 chunk size is not a productive ready....as someone can recite a word then stop in that case the asr doesn't even output it ...or even any words less than the 112 time it won't have an output unless i be in a continous reciting
small words like muqattat won't be output
words in surahs then silence won't be output
the smallest possible chunk or it is limited to 112 for a something ?
Wa ʿalaykum as-salām, akhi, and thank you for the kind words. Sharp question. 🤝
On the chunk size: the [70, 13] preset (about a 1.12s chunk, ~1040 ms lookahead) is set deliberately. I tested smaller windows, and below this the accuracy drops too much for Qur'an. The model needs that right context to get both the word and the tajwīd right, and shrinking the lookahead trades away far more quality than it gains. So [70, 13] stays as the recommended, accuracy-first preset rather than weakening recognition for everyone.
You have pointed at a real limitation of this streaming model though (short isolated utterances and the muqaṭṭaʿāt are not its strength), and I would rather solve it properly than cripple the accuracy.
You already know I have had a new model in training. This is the first time I am describing it properly, so here is what it actually is, and why it will handle this far better. It is a brand new Zipformer hybrid (RNN-T plus CTC), built from scratch:
- A genuine Arabic foundation model, pretrained on nearly 2,000 hours of broad Arabic (conversational, broadcast, dialectal, and read speech). For context, the current FastConformer was built on a roughly 1,150 hour Arabic base, so this is a much larger, from-scratch foundation, and a general purpose Arabic ASR base model in its own right.
- Then specialized for Qur'an on a large curated recitation set, with a Qur'an balanced tokenizer built for the orthography and vocabulary of the muṣḥaf.
- Streaming native and multi latency by design, trained across several lookahead profiles from the ground up, so low latency and short utterance cases (exactly what you are describing) are handled far more gracefully than any single fixed chunk model can.
- Built to run on phones. It is engineered for efficient on device, real time inference and exports to CoreML (iPhone Neural Engine) and ONNX, so it runs locally with low latency and no server round trip.
Zipformer is a newer, more efficient encoder than FastConformer, and building the Arabic base from scratch means the Qur'an fine tune stands on a much stronger, purpose built foundation. It is training now. Stay tuned, and jazāk Allāhu khayran for pushing on this. 🌙
ربنا تقبل منا انك انت السميع العليم
Our Lord Accept You Are the Hearing The Knowing
Thats Reeaaaally Awesome my brother <3 , Allah subhanu gave you alot of knowledge masha'a Allah <3
and yo uare using it in the sake of Allah Elhamdle Allah