addyo07/noisy-hinglish-asr
Viewer β’ Updated β’ 28.7k β’ 62
How to use addyo07/nemotron-3.5-0.6b-hinglish with NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("addyo07/nemotron-3.5-0.6b-hinglish")
transcriptions = asr_model.transcribe(["file.wav"])Fine-tuned version of nvidia/nemotron-3.5-asr-streaming-multilingual-0.6b on the Noisy Hinglish ASR dataset for improved Hindi and code-switched Hinglish speech recognition.
| Property | Value |
|---|---|
| Base Model | nvidia/nemotron-3.5-asr-streaming-multilingual-0.6b |
| Architecture | FastConformer RNNT with Prompt-Streaming |
| Parameters | 638M |
| Encoder | 24-layer Conformer, d_model=1024, 8-head attention |
| Decoder | 2-layer LSTM, pred_hidden=640 |
| Joint | RNNT Joint, joint_hidden=640 |
| Vocabulary | 13,087 BPE tokens + blank (13088) |
| Language Prompts | 128 (covering 100+ languages) |
| Context Size | Left: 56, Right: 3 (320ms balanced streaming) |
| Subsampling | Factor 8 (dw_striding) |
| Preprocessor | 128 Mel filters, 16kHz, 25ms window, 10ms stride |
| Training | Full fine-tune (all params trainable), encoder frozen |
| Training Steps | 12,000 steps (~8,124 best step) |
| Hardware | RTX 5070 Ti (16 GB) |
| License | Apache 2.0 |
| Split | Samples | WER% | CER% | FTR% | Ξ WER vs Base |
|---|---|---|---|---|---|
| Clean Hindi | 500 | 24.75 | 10.54 | 0.80 | +2.85 (regression) |
| Conversational Hinglish | 1,036 | 24.57 | 17.69 | 0.97 | -17.60 |
| Noisy Hindi | 250 | 31.67 | 32.33 | 11.60 | -7.46 |
| Negatives (noise) | 200 | 0.00 | 0.00 | 99.50 | +99.50 |
.
βββ README.md # This file
βββ config.json # Deployment configuration
βββ tokenizer.model # BPE tokenizer
βββ model/
β βββ nemotron-3.5-hinglish.nemo # NeMo checkpoint (full model)
β βββ encoder.onnx # ONNX encoder for deployment
β βββ decoder_joint.onnx # ONNX decoder+joint for deployment
β βββ *.weight / onnx__* # External ONNX weight files
βββ scripts/
β βββ export_onnx.py # ONNX export from .nemo
β βββ evaluate_finetuned.py # Benchmark evaluation
β βββ finetune_frozen_encoder.py # Training script
β βββ run_finetune.sh # Training launcher
β βββ prepare_training_data.py # Data preparation
β βββ unpack_dataset.py # Dataset extraction
βββ results/
βββ finetuned_benchmark_results.md
βββ nemotron_baseline_results.md
import torch
from nemo.collections.asr.models import EncDecRNNTBPEModelWithPrompt
model = EncDecRNNTBPEModelWithPrompt.restore_from("model/nemotron-3.5-hinglish.nemo")
model.eval()
# Transcribe with language prompt
transcription = model.transcribe(
["path/to/audio.wav"],
batch_size=1,
target_lang="hi-IN",
prompt_mode="langID",
)[0]
print(transcription)
The model/ directory contains exported ONNX models suitable for production deployment:
# Export from .nemo yourself (optional)
python scripts/export_onnx.py
{
"streaming": {
"chunkMs": 320,
"chunkSize": 4,
"rightContext": 3,
"lookaheadMs": 240,
"melFrames": 32,
"preCacheSize": 9,
"outputFrames": 4
}
}
The model was fine-tuned using NVIDIA NeMo toolkit with the following setup:
@misc{nemotron-3.5-hinglish-2026,
author = {Aditya},
title = {Nemotron-3.5 Hinglish Fine-Tuned ASR},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/addyo07/nemotron-3.5-0.6b-hinglish}
}