Parakeet CTC 0.6B โ GGUF (ggml-quantised)
GGUF / ggml conversions of nvidia/parakeet-ctc-0.6b for use with the crispasr CLI from CrispStrobe/CrispASR.
Parakeet CTC 0.6B is NVIDIA's 600 M-parameter English ASR model:
- English-only, lowercase output (matches the upstream training convention)
- FastConformer encoder (24 layers, d_model=1024, 8 heads) + single CTC head โ single forward pass per utterance, no autoregressive joint loop
- CC-BY-4.0 licence
- Strong WERs on the standard suite: LibriSpeech-clean 1.87 %, LibriSpeech-other 3.76 %, TEDLIUM-v3 3.78 %, GigaSpeech 10.35 %, Common Voice 7.00 %
This repo provides four quantisations, all converted from the upstream .nemo checkpoint via models/convert-stt-fastconformer-ctc-to-gguf.py (the same converter used for stt_en_fastconformer_ctc_*, since parakeet-ctc-0.6b shares the FastConformer-CTC architecture) and quantised with crispasr-quantize.
Files
| File | Size | Notes |
|---|---|---|
parakeet-ctc-0.6b.gguf |
~1.22 GB | F16, full precision |
parakeet-ctc-0.6b-q8_0.gguf |
~720 MB | Q8_0, near-lossless |
parakeet-ctc-0.6b-q5_0.gguf |
~520 MB | Q5_0 |
parakeet-ctc-0.6b-q4_k.gguf |
~455 MB | Q4_K โ recommended default |
All quantisations produce the same JFK 11 s transcript.
Quick start
# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr-cli
# 2. Run โ the CLI auto-downloads Q4_K from this repo by friendly name:
./build/bin/crispasr -m parakeet-ctc-0.6b -f your-audio.wav
# Or pre-download a specific quant via huggingface_hub and point to it:
python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('cstr/parakeet-ctc-0.6b-GGUF', 'parakeet-ctc-0.6b-q8_0.gguf'))"
./build/bin/crispasr -m parakeet-ctc-0.6b-q8_0.gguf -f your-audio.wav
The crispasr CLI auto-detects the backend from filename โ parakeet-ctc-*.gguf routes to fastconformer-ctc because the architecture is identical to the stt_en_fastconformer_ctc_* family. Registry key parakeet-ctc-0.6b triggers Q4_K auto-download.
Model architecture
| Component | Details |
|---|---|
| Encoder | 24-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9, attention biases ON |
| Subsampling | dw_striding stack, 8ร temporal (50 โ 12.5 fps) |
| CTC head | Conv1d(1024 โ 1025), k=1; vocab 1024 SentencePiece + 1 blank |
| Audio | 16 kHz mono, 80 mel bins, n_fft=512, hop=160, win=400 |
| Parameters | ~600 M |
The mel filterbank and Hann window are baked into the GGUF (preprocessor.fb, preprocessor.window). BatchNorm in the convolution module is folded into the depthwise conv weights at load time.
Performance (Apple M1 Metal, JFK 11 s, q8_0)
| Path | Median wallclock | RTร |
|---|---|---|
| crispasr ctypes Session, Metal | 0.46 s | 24.1ร |
| onnx-asr (CPU EP, int8) | 0.72 s | 15.2ร |
| onnx-asr (CoreML EP, int8) | 1.28 s | 8.6ร |
(Apples-to-apples on CTC-vs-CTC at the same param count vs istupakov/parakeet-ctc-0.6b-onnx. See PERFORMANCE.md for the full methodology.)
Output convention
The upstream model emits lowercase, un-punctuated English. If you need cased + punctuated output, pair with the parakeet-tdt-0.6b-v3 (cstr/parakeet-tdt-0.6b-v3-GGUF) instead, or post-process via crispasr's --punc-model (FireRedPunc / fullstop-punc).
Attribution
- Original model:
nvidia/parakeet-ctc-0.6b(CC-BY-4.0). NVIDIA NeMo team. - GGUF conversion + ggml runtime:
CrispStrobe/CrispASRโ FastConformer-CTC backend, seesrc/canary_ctc.cpp. - Reference inference:
istupakov/onnx-asrwas the cross-check for the CTC head argmax / token decoding.
Related
- C++ runtime: CrispStrobe/CrispASR
- Sister TDT model:
cstr/parakeet-tdt-0.6b-v3-GGUF - Larger CTC variant:
cstr/parakeet-ctc-1.1b-GGUF - Same-size FastConformer-CTC:
cstr/stt-en-fastconformer-ctc-xlarge-GGUF
License
CC-BY-4.0, inherited from the base model. Use of these GGUF files must comply with the CC-BY-4.0 license including attribution.
- Downloads last month
- 142
Model tree for cstr/parakeet-ctc-0.6b-GGUF
Base model
nvidia/parakeet-ctc-0.6b