Instructions to use twangodev/rasr-parakeet-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use twangodev/rasr-parakeet-v1 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("twangodev/rasr-parakeet-v1") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
rasr-parakeet-v1
ATC ASR finetune of nvidia/parakeet-tdt-0.6b-v3 on a synthetic US-style ATC corpus (radiotalk-us-audio-tada-noisy) with a small real-ATC anchor (ATCO2 + ATCOSIM train splits). Trained as v1 of the rasr toolkit.
Headline
| Metric | This model | Prior public SOTA (jlvdoorn/whisper-large-v3-atco2-asr) |
|---|---|---|
| ATCO2 val WER | 0.125 | 0.157 |
| ATCO2 val CER | 0.078 | 0.088 |
| ATCO2 val numeric WER | 0.050 | 0.074 |
21% relative WER reduction over the previous public SOTA on the ATCO2 validation benchmark, with a smaller base model (0.6B params vs 1.55B).
Quick start
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("twangodev/rasr-parakeet-v1")
result = model.transcribe(["atc_clip.wav"])
print(result[0].text)
Or via the rasr eval toolkit:
pip install rasr
rasr eval run \
-m nemo:hf://twangodev/rasr-parakeet-v1 \
-d hf:jlvdoorn/atco2-asr:validation \
--language en --batch-size 16
Architecture
- Base:
nvidia/parakeet-tdt-0.6b-v3(FastConformer encoder + TDT decoder, 0.6B params) - Tokenizer: kept from base — SentencePiece BPE 8192 tokens, multilingual
- Sample rate: 16 kHz mono
- Max input duration: 18 seconds (extended-length inputs may degrade — TDT joint memory)
Training data
This model was trained on transcripts generated by Llama 3.2 and audio synthesized via the Tada TTS pipeline. Specifically:
| Source | Type | Role |
|---|---|---|
twangodev/radiotalk-us-audio-tada-noisy (200k subset) |
Synthetic US ATC | Bulk training audio. Dialogue transcripts generated by Llama 3.2, audio synthesized by Tada (TTS) with VHF channel degradation pipeline. |
jlvdoorn/atco2-asr (train split, ~446 clips) |
Real European ATC | Real-data anchor; upweighted 10× to supply real-radio acoustic priors and European operator vocabulary. |
jlvdoorn/atco2-asr-atcosim (train, ~10k clips) |
Real EU ATC + simulator | Real-data anchor; upweighted 10×. |
Llama 3.2 attribution
This model is "Built with Llama" under the Llama 3.2 Community License. Llama 3.2 was used to generate the ATC dialogue transcripts in the radiotalk-us-audio-tada-noisy dataset — those transcripts are the supervised targets the model learned to produce. The audio itself was synthesized by Tada (not Llama).
Training recipe
Full reproducible recipe: configs/train/rtx6kpro/parakeet-mixed.yaml.
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW, β=(0.9, 0.98), weight_decay=1e-3 |
| Learning rate | 1e-4 |
| Schedule | CosineAnnealing, warmup 5000 steps, min_lr=1e-6 |
| Batch size | 32 (effective) |
| Precision | bf16-mixed |
| Max steps | 50,000 |
| Augmentation | SpecAugment (default), speed perturb 0.95-1.05 |
| Max audio duration | 18.0 s |
| Mixing | weighted manifest concat (radiotalk ×1, ATCO2 train ×10, ATCO2+ATCOSIM train ×10) |
| Hardware | NVIDIA RTX PRO 6000 Blackwell (96 GB) |
| Wall clock | ~12 hours |
Strengths
- Structurally robust ATC output. Position-call grammar (CTAF + towered), runway IDs, headings, and altitude readbacks are recovered cleanly.
- Strong on numeric/safety-critical content. Per-utterance numeric WER 0.050 on ATCO2 val (3× better than prior SOTA on the same axis).
- Stable on out-of-distribution audio. Zero runaway hallucinations observed on real US GA audio (TartanAviation KBTP), unlike LLM-decoder ASR models (e.g., Canary-Qwen, Granite Speech) which confabulate confidently on hard audio.
- Small footprint. 0.6B params, fits in 4 GB VRAM at inference; ~10× faster than larger Whisper-based ATC finetunes.
Limitations
This model was trained on a US-style synthetic corpus plus a European real-data anchor. The combination produces specific biases users should be aware of:
Operator substitution bias. The model has been observed substituting unfamiliar callsigns with familiar ones from its training distribution — e.g., emitting "Lufthansa" or "Delta" where the audio contained a less-common operator. Particularly noticeable on US general aviation (GA) traffic, where N-number tail callsigns (e.g., "Cessna Eight One Niner Charlie Mike") may be mis-substituted with major airline prefixes.
Limited US GA airport name coverage. The model has not seen most small US GA airport names during training. On real US GA audio (e.g., TartanAviation KBTP recordings), it produces phonetically-similar substitutions for the airport name ("Bravo Traffic", "Bello Traffic") instead of the correct name ("Butler Traffic").
European real-anchor contamination on US output. Training included European-real ATCO2/ATCOSIM data to anchor distribution and unblock the SOTA result on ATCO2 val. This European prior is visible in US-style transcription (occasional "Swiss", "Bern Tower", "Belfast Tower" tokens that should not appear).
Sanity rate on real US GA audio: 77% (10% CLEAN + 67% PLAUSIBLE-MISHEARD across 69 TartanAviation KBTP clips). Of the imperfect cases, the failure is overwhelmingly substitution of correct word in correct slot, not garbling or hallucination.
Evaluation distribution. This model is benchmarked against ATCO2 (European real ATC). It has not been evaluated against a US ATC benchmark — no fully public US ATC ASR test set with annotations currently exists.
Recommended usage
- For European ATC (or audio matching ATCO2-style distribution): deploy as-is. Numbers above are the expected performance.
- For US ATC: use with inference-time hot-word biasing against a known callsign + airport-name vocabulary specific to the deployment region. NeMo's TDT decoder supports hot-word biasing via
change_decoding_strategy(). Most substitution failures collapse to correct output with appropriate biasing. - For safety-critical applications: always layer with confidence-based rejection. This model is intended as a research/development checkpoint, not as a safety-certified ATC transcription system.
Citation
If you use this model, please cite the project and the underlying components:
@software{rasr,
author = {Ding, James},
title = {rasr: ATC ASR finetuning toolkit},
url = {https://github.com/twangodev/rasr},
year = {2026}
}
And the base model:
@misc{parakeet-tdt,
author = {NVIDIA},
title = {Parakeet-TDT-0.6B-v3},
url = {https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}
And Llama 3.2 (training transcripts):
@misc{llama3.2,
author = {{Meta AI}},
title = {The Llama 3.2 Herd of Models},
year = {2024},
url = {https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/}
}
License
Released under the Llama 3.2 Community License ("Built with Llama"). This is the binding upstream license because the training transcripts were generated by Llama 3.2, and the resulting model is treated as a derivative work of Llama Materials for licensing purposes.
In addition to the Llama 3.2 terms, this model also inherits attribution and use requirements from its other parents:
- Parakeet-TDT-0.6B-v3 (CC-BY-4.0, NVIDIA) — base model
- ATCO2 corpus (CC-BY-4.0) — real-data anchor (train split)
- ATCOSIM corpus (research use; see source)
- radiotalk-us-audio-tada-noisy (Llama 3.2 Community License — transcripts generated by Llama 3.2, audio synthesized via Tada) — synthetic training audio
To redistribute or deploy:
- Include a copy of the Llama 3.2 Community License.
- Display "Built with Llama" in your product / user interface / about page.
- Comply with the Llama 3.2 Acceptable Use Policy.
- If your service exceeds 700M monthly active users, request a separate commercial license from Meta.
This is not legal advice. If you are deploying this model commercially or at scale, consult a lawyer regarding the interaction of the upstream licenses.
- Downloads last month
- 5
Model tree for twangodev/rasr-parakeet-v1
Base model
nvidia/parakeet-tdt-0.6b-v3Datasets used to train twangodev/rasr-parakeet-v1
jlvdoorn/atco2-asr-atcosim
jlvdoorn/atco2-asr
Collection including twangodev/rasr-parakeet-v1
Evaluation results
- Word Error Rate on ATCO2 (jlvdoorn/atco2-asr validation)validation set self-reported0.125
- Character Error Rate on ATCO2 (jlvdoorn/atco2-asr validation)validation set self-reported0.078