Nemotron-3.5 Hinglish Fine-Tuned

Fine-tuned version of nvidia/nemotron-3.5-asr-streaming-multilingual-0.6b on the Noisy Hinglish ASR dataset for improved Hindi and code-switched Hinglish speech recognition.

Model Details

Property Value
Base Model nvidia/nemotron-3.5-asr-streaming-multilingual-0.6b
Architecture FastConformer RNNT with Prompt-Streaming
Parameters 638M
Encoder 24-layer Conformer, d_model=1024, 8-head attention
Decoder 2-layer LSTM, pred_hidden=640
Joint RNNT Joint, joint_hidden=640
Vocabulary 13,087 BPE tokens + blank (13088)
Language Prompts 128 (covering 100+ languages)
Context Size Left: 56, Right: 3 (320ms balanced streaming)
Subsampling Factor 8 (dw_striding)
Preprocessor 128 Mel filters, 16kHz, 25ms window, 10ms stride
Training Full fine-tune (all params trainable), encoder frozen
Training Steps 12,000 steps (~8,124 best step)
Hardware RTX 5070 Ti (16 GB)
License Apache 2.0

Performance

Benchmark Results

Split Samples WER% CER% FTR% Ξ” WER vs Base
Clean Hindi 500 24.75 10.54 0.80 +2.85 (regression)
Conversational Hinglish 1,036 24.57 17.69 0.97 -17.60
Noisy Hindi 250 31.67 32.33 11.60 -7.46
Negatives (noise) 200 0.00 0.00 99.50 +99.50

Key Improvements

  • Hinglish WER reduced from 42.17% β†’ 24.57% (-17.6 pp), matching Qwen-ft V2 performance
  • Noisy Hindi WER reduced from 39.13% β†’ 31.67% (-7.46 pp)
  • FTR (False Trigger Rate) on background noise: 99.5% rejection (from 0% baseline)
  • Slight regression on clean Hindi (+2.85 pp) due to model learning Hinglish code-switching patterns

Repository Structure

.
β”œβ”€β”€ README.md                  # This file
β”œβ”€β”€ config.json                # Deployment configuration
β”œβ”€β”€ tokenizer.model            # BPE tokenizer
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ nemotron-3.5-hinglish.nemo   # NeMo checkpoint (full model)
β”‚   β”œβ”€β”€ encoder.onnx                  # ONNX encoder for deployment
β”‚   β”œβ”€β”€ decoder_joint.onnx            # ONNX decoder+joint for deployment
β”‚   └── *.weight / onnx__*            # External ONNX weight files
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ export_onnx.py                # ONNX export from .nemo
β”‚   β”œβ”€β”€ evaluate_finetuned.py         # Benchmark evaluation
β”‚   β”œβ”€β”€ finetune_frozen_encoder.py    # Training script
β”‚   β”œβ”€β”€ run_finetune.sh               # Training launcher
β”‚   β”œβ”€β”€ prepare_training_data.py      # Data preparation
β”‚   └── unpack_dataset.py             # Dataset extraction
└── results/
    β”œβ”€β”€ finetuned_benchmark_results.md
    └── nemotron_baseline_results.md

Usage

NeMo Inference (Python)

import torch
from nemo.collections.asr.models import EncDecRNNTBPEModelWithPrompt

model = EncDecRNNTBPEModelWithPrompt.restore_from("model/nemotron-3.5-hinglish.nemo")
model.eval()

# Transcribe with language prompt
transcription = model.transcribe(
    ["path/to/audio.wav"],
    batch_size=1,
    target_lang="hi-IN",
    prompt_mode="langID",
)[0]
print(transcription)

ONNX Deployment

The model/ directory contains exported ONNX models suitable for production deployment:

  • encoder.onnx: Streaming encoder with cache-aware inference
  • decoder_joint.onnx: RNN-T decoder + joint network
# Export from .nemo yourself (optional)
python scripts/export_onnx.py

Streaming Configuration (config.json)

{
  "streaming": {
    "chunkMs": 320,
    "chunkSize": 4,
    "rightContext": 3,
    "lookaheadMs": 240,
    "melFrames": 32,
    "preCacheSize": 9,
    "outputFrames": 4
  }
}

Training Details

The model was fine-tuned using NVIDIA NeMo toolkit with the following setup:

  • Base model: nvidia/nemotron-3.5-asr-streaming-multilingual-0.6b
  • Dataset: ~23,500 training samples, ~2,300 validation samples from Noisy Hinglish ASR
  • Optimizer: NovoGrad with learning rate 0.1, cosine schedule with warmup
  • Precision: BF16 mixed precision
  • Encoder frozen: Yes (decoder + joint trained)
  • Loss: RNNT with fastemit lambda 0.005

Citation

@misc{nemotron-3.5-hinglish-2026,
  author = {Aditya},
  title = {Nemotron-3.5 Hinglish Fine-Tuned ASR},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/addyo07/nemotron-3.5-0.6b-hinglish}
}
Downloads last month
396
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train addyo07/nemotron-3.5-0.6b-hinglish

Evaluation results