Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

πŸ€— Hugging Face | πŸ“– Github | πŸ“‘ Technical report

This is a safetensors conversion of gpt-omni/mini-omni.

Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Features

βœ… Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

βœ… Talking while thinking, with the ability to generate text and audio at the same time.

βœ… Streaming audio outupt capabilities.

βœ… With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

NOTE: please refer to https://github.com/gpt-omni/mini-omni for more details.

Downloads last month
11
Safetensors
Model size
694M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for leafspark/mini-omni-safetensors

Base model

Qwen/Qwen2-0.5B
Finetuned
(66)
this model

Space using leafspark/mini-omni-safetensors 1