Mini-Omni2 GGUF

GGUF conversion of gpt-omni/mini-omni2 for use with CrispASR.

Architecture: Whisper-small encoder (80 mel, 12L, 768d) + whisperMLP adapter (SwiGLU 768→4864→896) + Qwen2-0.5B LLM (896d, 24L, GQA 14/2).

Supports ASR (audio→text), TTS (text→audio), and speech-to-speech (audio→audio). TTS/S2S require the SNAC 24kHz codec companion (cstr/snac-24khz-GGUF).

Files

File Quant Size Notes
mini-omni2-f16.gguf F16 ~1.5 GB Full precision
mini-omni2-q8_0.gguf Q8_0 ~1.2 GB Encoder/adapter at F16, LLM at Q8_0
mini-omni2-q4_k.gguf Q4_K ~1.0 GB Encoder/adapter at F16, LLM at Q4_K

Usage

# ASR
crispasr -m mini-omni2-q4_k.gguf -f audio.wav --backend mini-omni2

# TTS (needs SNAC codec)
crispasr -m mini-omni2-q4_k.gguf --tts "Hello world" \
    --codec-model snac-24khz.gguf --tts-output out.wav --backend mini-omni2
Downloads last month
213
GGUF
Model size
0.8B params
Architecture
mini-omni2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/mini-omni2-GGUF

Quantized
(1)
this model