da-asr-streaming-0.6b

A streaming Danish (da) ASR model, fine-tuned from nvidia/nemotron-3.5-asr-streaming-0.6b on LokaalHub/da-asr-cv.

Community fine-tune, not an NVIDIA model. A derivative of NVIDIA's Nemotron 3.5 ASR. NVIDIA did not produce, endorse, or review this model. "Nemotron" is a trademark of NVIDIA, used here only to identify the base model.

TL;DR

Danish (da) is a supported locale of the base model, but its out-of-the-box accuracy on Common Voice is modest (~35.83% WER). A single full fine-tune on ~10.0h brings it to ~16.01% WER. Prompt slot used during fine-tuning: da (own slot).

Results

Condition	Base	Fine-tuned	Rel. improvement
WER (offline, full-context, normalized) on `LokaalHub/da-asr-cv` test	35.83%	16.01%	55.3%

Offline (full-context) WER via NeMo transcribe_speech.py. Cache-aware streaming WER (the condition NVIDIA headlines) was not measured for this release.

Usage

import nemo.collections.asr as nemo_asr
m = nemo_asr.models.ASRModel.restore_from("model.nemo")  # from this repo
m.transcribe(["audio.wav"])   # target_lang prompt: da

Training

Single full fine-tune (init_from_nemo_model), bf16, NoamAnnealing. Data: LokaalHub/da-asr-cv (~10.0h train). Built and trained by the asr-loop pipeline.

Limitations

Low-resource fine-tune on read speech (Common Voice). Evaluated on a 2.1h speaker-disjoint test subset — not directly comparable to published full-Common-Voice-test numbers.

Downloads last month: 1,012

Model tree for LokaalHub/nemotron-3.5-da

Base model

nvidia/nemotron-3.5-asr-streaming-0.6b

Finetuned

(22)

this model

Dataset used to train LokaalHub/nemotron-3.5-da

Evaluation results

WER (offline / full-context, normalized) on LokaalHub/da-asr-cv (test)
test set self-reported

16.010