Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

πŸ€— Hugging Face | πŸ“– Github | πŸ“‘ Technical report

This is a safetensors conversion of gpt-omni/mini-omni.

Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Features

βœ… Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

βœ… Talking while thinking, with the ability to generate text and audio at the same time.

βœ… Streaming audio outupt capabilities.

βœ… With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

NOTE: please refer to https://github.com/gpt-omni/mini-omni for more details.

Downloads last month
9
Safetensors
Model size
694M params
Tensor type
F32
Β·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for leafspark/mini-omni-safetensors

Base model

Qwen/Qwen2-0.5B
Finetuned
(51)
this model

Space using leafspark/mini-omni-safetensors 1