VisualEars FastConformer Persian ASR Full A+B

Persian/Farsi ASR fine-tune for the small/fast VisualEars model, trained from nvidia/stt_fa_fastconformer_hybrid_large on the full A+B training mix.

Main Checkpoint

  • fa_fastconformer_ab_final.nemo: final NeMo FastConformer hybrid RNNT/CTC checkpoint from the full A+B run.

Runtime Exports

Canonical runtime exports live in separate derivative model repos so Hugging Face can attach them to this fine-tune as quantized/export variants:

Repo Format Validation
visualears-fastconformer-fa-full-ab-onnx-fp ONNX FP fixed CTC core 100.00% CTC argmax parity
visualears-fastconformer-fa-full-ab-onnx-w4 ONNX Runtime weight-only 4-bit, asymmetric block-32 98.61% CTC argmax parity
visualears-fastconformer-fa-full-ab-coreml-fp16 CoreML FP16 fixed CTC core 99.85% CTC argmax parity
visualears-fastconformer-fa-full-ab-coreml-w4 CoreML 4-bit k-means palettized, compressed variant 98.06% CTC argmax parity
visualears-fastconformer-fa-full-ab-coreml-w4-quality CoreML 4-bit k-means palettized, quality-first variant 99.65% CTC argmax parity
visualears-fastconformer-fa-full-ab-litert-fp LiteRT/TFLite FP fixed CTC core 100.00% CTC argmax parity; 100.00% transcript parity on 16 calibration items
visualears-fastconformer-fa-full-ab-litert-w4 LiteRT/TFLite selected fully-connected weight-only 4-bit 98.23% frame CTC argmax parity; failed transcript parity at 37.5% on 16 calibration items
visualears-fastconformer-fa-full-ab-fp16 NeMo FP16 reduced-precision checkpoint 98.0% exact transcript match vs FP base on 200 FLEURS-fa eval clips
visualears-fastconformer-fa-full-ab-fp8 NeMo FP8 via NVIDIA ModelOpt 18.48% WER / 6.69% CER on 200 FLEURS-fa eval clips; 99.47% WER retention vs FP base
visualears-fastconformer-fa-full-ab-nvfp4 NeMo NVFP4 W4A4 via NVIDIA ModelOpt 20.33% WER / 7.38% CER on 200 FLEURS-fa eval clips

The export repos are fixed-frame acoustic CTC-core artifacts. They take precomputed log-mel features as processed_signal; they are not full raw-audio-to-text pipelines by themselves.

Training Snapshot

  • Train manifest: 6,231,918 rows
  • Validation manifest: 31,424 rows
  • Final train step: 48,687
  • NeMo architecture: FastConformer hybrid RNNT/CTC

Benchmarks

External benchmark snapshot from June 10, 2026:

Decoder Golha gold-69 WER FLEURS fa WER FLEURS fa CER
RNNT greedy 25.29 15.73 5.25
CTC + 4-gram LM, alpha=0.2 beta=-1.0 beam=50 25.96 13.60 5.39

The LM setting was calibrated on a FLEURS-256 slice and helped FLEURS WER, but did not improve Golha in this snapshot.

Notes

This is a research checkpoint. Normalization and tokenization choices matter for reported WER/CER.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/visualears-fastconformer-fa-full-ab

Finetuned
(3)
this model
Quantizations
10 models