Qwen2-Audio-7B-GGUF / README.md
alanzhuly's picture
Create README.md
b695973 verified
|
raw
history blame
3.29 kB

Qwen2-Audio

Example

Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and also provides robust audio analysis for local use cases like:

  • Speaker identification and response
  • Speech translation and transcription
  • Mixed audio and noise detection
  • Music and sound analysis

We're bringing Qwen2-Audio to edge devices with Nexa SDK, offering various quantization options.

  • Voice Chat: Users can freely engage in voice interactions with Qwen2-Audio without text input.
  • Audio Analysis: Users can provide both audio and text instructions for analysis during the interaction.

Demo

How to Run Locally On-Device

In the following, we demonstrate how to run Qwen2-Audio locally on your device.

Step 1: Install Nexa-SDK (local on-device inference framework)

Install Nexa-SDK

Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.

Step 2: Then run the following code in your terminal to run with local streamlit UI

nexa run qwen2audio -st

or to use in terminal:

nexa run qwen2audio

Usage Instructions

For terminal:

  1. Drag and drop your audio file into the terminal (or enter file path on Linux)
  2. Add text prompt to guide analysis or leave empty for direct voice input

System Requirements

💻 RAM Requirements:

  • Default q4_K_M version requires 4.2GB of RAM
  • Check the RAM requirements table for different quantization versions

🎵 Audio Format:

  • Optimal: 16kHz .wav format
  • Other formats and sample rates are supported with automatic conversion

Use Cases

Voice Chat

  • Answer daily questions
  • Offer suggestions
  • Speaker identification and response
  • Speech translation
  • Detecting background noise and responding accordingly

Audio Analysis

  • Information Extraction
  • Audio summary
  • Speech Transcription and Expansion
  • Mixed audio and noise detection
  • Music and sound analysis

Performance Benchmark

Example

Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.

Example

To learn more about Qwen2-Audio's capability, please refer to their [Blog], [GitHub], and [Report].

Follow Nexa AI to run more models on-device

Website