Parakeet CTC 0.6B โ€” GGUF (ggml-quantised)

GGUF / ggml conversions of nvidia/parakeet-ctc-0.6b for use with the crispasr CLI from CrispStrobe/CrispASR.

Parakeet CTC 0.6B is NVIDIA's 600 M-parameter English ASR model:

  • English-only, lowercase output (matches the upstream training convention)
  • FastConformer encoder (24 layers, d_model=1024, 8 heads) + single CTC head โ€” single forward pass per utterance, no autoregressive joint loop
  • CC-BY-4.0 licence
  • Strong WERs on the standard suite: LibriSpeech-clean 1.87 %, LibriSpeech-other 3.76 %, TEDLIUM-v3 3.78 %, GigaSpeech 10.35 %, Common Voice 7.00 %

This repo provides four quantisations, all converted from the upstream .nemo checkpoint via models/convert-stt-fastconformer-ctc-to-gguf.py (the same converter used for stt_en_fastconformer_ctc_*, since parakeet-ctc-0.6b shares the FastConformer-CTC architecture) and quantised with crispasr-quantize.

Files

File Size Notes
parakeet-ctc-0.6b.gguf ~1.22 GB F16, full precision
parakeet-ctc-0.6b-q8_0.gguf ~720 MB Q8_0, near-lossless
parakeet-ctc-0.6b-q5_0.gguf ~520 MB Q5_0
parakeet-ctc-0.6b-q4_k.gguf ~455 MB Q4_K โ€” recommended default

All quantisations produce the same JFK 11 s transcript.

Quick start

# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr-cli

# 2. Run โ€” the CLI auto-downloads Q4_K from this repo by friendly name:
./build/bin/crispasr -m parakeet-ctc-0.6b -f your-audio.wav

# Or pre-download a specific quant via huggingface_hub and point to it:
python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('cstr/parakeet-ctc-0.6b-GGUF', 'parakeet-ctc-0.6b-q8_0.gguf'))"
./build/bin/crispasr -m parakeet-ctc-0.6b-q8_0.gguf -f your-audio.wav

The crispasr CLI auto-detects the backend from filename โ€” parakeet-ctc-*.gguf routes to fastconformer-ctc because the architecture is identical to the stt_en_fastconformer_ctc_* family. Registry key parakeet-ctc-0.6b triggers Q4_K auto-download.

Model architecture

Component Details
Encoder 24-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9, attention biases ON
Subsampling dw_striding stack, 8ร— temporal (50 โ†’ 12.5 fps)
CTC head Conv1d(1024 โ†’ 1025), k=1; vocab 1024 SentencePiece + 1 blank
Audio 16 kHz mono, 80 mel bins, n_fft=512, hop=160, win=400
Parameters ~600 M

The mel filterbank and Hann window are baked into the GGUF (preprocessor.fb, preprocessor.window). BatchNorm in the convolution module is folded into the depthwise conv weights at load time.

Performance (Apple M1 Metal, JFK 11 s, q8_0)

Path Median wallclock RTร—
crispasr ctypes Session, Metal 0.46 s 24.1ร—
onnx-asr (CPU EP, int8) 0.72 s 15.2ร—
onnx-asr (CoreML EP, int8) 1.28 s 8.6ร—

(Apples-to-apples on CTC-vs-CTC at the same param count vs istupakov/parakeet-ctc-0.6b-onnx. See PERFORMANCE.md for the full methodology.)

Output convention

The upstream model emits lowercase, un-punctuated English. If you need cased + punctuated output, pair with the parakeet-tdt-0.6b-v3 (cstr/parakeet-tdt-0.6b-v3-GGUF) instead, or post-process via crispasr's --punc-model (FireRedPunc / fullstop-punc).

Attribution

Related

License

CC-BY-4.0, inherited from the base model. Use of these GGUF files must comply with the CC-BY-4.0 license including attribution.

Downloads last month
142
GGUF
Model size
0.6B params
Architecture
canary-ctc
Hardware compatibility
Log In to add your hardware

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/parakeet-ctc-0.6b-GGUF

Quantized
(8)
this model