Parakeet CTC 1.1B β GGUF (ggml-quantised)
GGUF / ggml conversions of nvidia/parakeet-ctc-1.1b for use with the crispasr CLI from CrispStrobe/CrispASR.
Parakeet CTC 1.1B is NVIDIA's 1.1 B-parameter English ASR model β the deeper sibling of parakeet-ctc-0.6b:
- English-only, lowercase output
- FastConformer encoder β 42 layers (vs 24 in 0.6b), d_model=1024, 8 heads + single CTC head; trained on 64K hours of English (40K private + 24K public)
- CC-BY-4.0 licence
- WERs (slightly better than 0.6b across the board): LibriSpeech-clean 1.83 %, LibriSpeech-other 3.54 %, TEDLIUM-v3 4.20 %, GigaSpeech 10.27 %, Common Voice 6.53 %
This repo provides four quantisations, all converted from the upstream .nemo checkpoint via models/convert-stt-fastconformer-ctc-to-gguf.py and quantised with crispasr-quantize.
Files
| File | Size | Notes |
|---|---|---|
parakeet-ctc-1.1b.gguf |
~2.13 GB | F16, full precision |
parakeet-ctc-1.1b-q8_0.gguf |
~1.26 GB | Q8_0, near-lossless |
parakeet-ctc-1.1b-q5_0.gguf |
~910 MB | Q5_0 |
parakeet-ctc-1.1b-q4_k.gguf |
~795 MB | Q4_K β recommended default |
All quantisations produce the same JFK 11 s transcript.
Quick start
# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr-cli
# 2. Run β the CLI auto-downloads Q4_K from this repo by friendly name:
./build/bin/crispasr -m parakeet-ctc-1.1b -f your-audio.wav
# Or pre-download a specific quant via huggingface_hub and point to it:
python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('cstr/parakeet-ctc-1.1b-GGUF', 'parakeet-ctc-1.1b-q8_0.gguf'))"
./build/bin/crispasr -m parakeet-ctc-1.1b-q8_0.gguf -f your-audio.wav
The crispasr CLI auto-detects the backend from filename β parakeet-ctc-*.gguf routes to fastconformer-ctc because the architecture is identical to the stt_en_fastconformer_ctc_* family. Registry key parakeet-ctc-1.1b triggers Q4_K auto-download.
Model architecture
| Component | Details |
|---|---|
| Encoder | 42-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9, attention biases ON |
| Subsampling | dw_striding stack, 8Γ temporal (50 β 12.5 fps) |
| CTC head | Conv1d(1024 β 1025), k=1; vocab 1024 SentencePiece + 1 blank |
| Audio | 16 kHz mono, 80 mel bins, n_fft=512, hop=160, win=400 |
| Parameters | ~1.1 B |
Identical to the 0.6b variant in every dimension except encoder depth (42 vs 24 layers). The crispasr fastconformer-ctc backend reads n_layers from the GGUF metadata and resizes per-layer state at load time, so no code changes are needed for the deeper model.
Output convention
Lowercase, un-punctuated English. For cased + punctuated, see cstr/parakeet-tdt-0.6b-v3-GGUF (multilingual TDT, slightly higher WER but cased+punct out-of-the-box) or post-process via crispasr's --punc-model.
Attribution
- Original model:
nvidia/parakeet-ctc-1.1b(CC-BY-4.0). NVIDIA NeMo team. - GGUF conversion + ggml runtime:
CrispStrobe/CrispASRβ FastConformer-CTC backend, seesrc/canary_ctc.cpp.
Related
- C++ runtime: CrispStrobe/CrispASR
- Smaller CTC variant:
cstr/parakeet-ctc-0.6b-GGUF - TDT counterpart (multilingual, cased+punct):
cstr/parakeet-tdt-0.6b-v3-GGUF
License
CC-BY-4.0, inherited from the base model. Use of these GGUF files must comply with the CC-BY-4.0 license including attribution.
- Downloads last month
- 136
Model tree for cstr/parakeet-ctc-1.1b-GGUF
Base model
nvidia/parakeet-ctc-1.1b