import gradio as gr from openai import OpenAI import os import random import zipfile import json from gtts import gTTS from pydub import AudioSegment import wave import subprocess # 確保 Vosk 模組已安裝 try: from vosk import Model, KaldiRecognizer except ImportError: subprocess.check_call(['pip', 'install', 'vosk']) from vosk import Model, KaldiRecognizer # 設定 OpenAI API 金鑰 client = OpenAI(api_key=os.getenv("sk-tHjndptHgYkaZY6RTPZh-sWkkd318Nqor5qPsyPoqWT3BlbkFJQgnwmTeMnI0VhKOmYv-XLXTWSll-Nan80ZlQiEbT0A")) # 解壓並隨機選擇背景音樂 def unzip_random_background_music(zip_path="background_music.mp3.zip", extract_folder="audio_files"): if not os.path.exists(extract_folder): with zipfile.ZipFile(zip_path, 'r') as zip_ref: zip_ref.extractall(extract_folder) music_files = [f for f in os.listdir(extract_folder) if f.endswith(".mp3")] if not music_files: raise FileNotFoundError("❗ 解壓後未找到背景音樂檔案。") return os.path.join(extract_folder, random.choice(music_files)) # 使用 OpenAI 生成建議內容 def generate_response(input_text, language="zh"): response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "你是一個根據非暴力溝通理論提供建議的助理,請以同理心和溫暖的語氣給予建議。"}, {"role": "user", "content": input_text} ] ) return response.choices[0].message.content # 生成語音回應並加入背景音樂 def generate_audio_with_music(input_text, language="zh"): try: # 生成背景音樂 background_music_path = unzip_random_background_music() # 生成文字回應 text_response = generate_response(input_text, language) # 生成 TTS 音檔 tts = gTTS(text=text_response, lang=language) tts.save("response.mp3") response_audio = AudioSegment.from_file("response.mp3") # 合成背景音樂與語音 background_music = AudioSegment.from_file(background_music_path) background_music = background_music - 20 # 減少背景音樂音量 combined_audio = background_music.overlay(response_audio, position=0) # 匯出最終音檔 final_audio_path = "final_audio.mp3" combined_audio.export(final_audio_path, format="mp3") return text_response, final_audio_path except Exception as e: return f"❗ 發生錯誤: {str(e)}", None # 即時語音轉文字功能 (使用 Vosk) def transcribe_speech(audio_file): try: model_path = "models/vosk-model-small-zh" # Vosk 中文模型 if not os.path.exists(model_path): return "❗ 找不到 Vosk 模型,請確認已下載並放置在正確目錄。" # 轉換音檔格式 converted_audio = "converted_audio.wav" audio = AudioSegment.from_file(audio_file) audio = audio.set_channels(1).set_frame_rate(16000).set_sample_width(2) audio.export(converted_audio, format="wav") wf = wave.open(converted_audio, "rb") model = Model(model_path) recognizer = KaldiRecognizer(model, wf.getframerate()) # 開始語音辨識 while True: data = wf.readframes(4000) if len(data) == 0: break recognizer.AcceptWaveform(data) result = recognizer.Result() result_json = json.loads(result) recognized_text = result_json.get("text", "").strip() if not recognized_text: return "❗ 無法辨識語音,請重新錄製或檢查音檔品質。" return recognized_text except Exception as e: return f"❗ 語音辨識錯誤: {str(e)}" # Gradio 介面設計 def main(): with gr.Blocks() as demo: gr.Markdown("### 🎙️ 即時語音與文字分析平台 🌟") gr.Markdown("請錄製您的語音或輸入文字,我將提供溫暖的建議與語音回應。") with gr.Row(): input_text = gr.Textbox(label="📝 輸入文字 (任何語言)") audio_input = gr.Audio(label="🎤 錄製您的語音 (WAV格式)", type="filepath") submit_button = gr.Button("💬 產生建議與語音") with gr.Row(): output_text = gr.Textbox(label="📄 生成建議", interactive=False) output_audio = gr.Audio(label="🎶 播放語音回應", type="filepath") def process_input(text, audio): if audio: # 如果有語音輸入,先轉文字 text = transcribe_speech(audio) if "❗" in text: # 若語音轉換失敗,直接返回錯誤 return text, None return generate_audio_with_music(text, language="zh") submit_button.click(fn=process_input, inputs=[input_text, audio_input], outputs=[output_text, output_audio]) demo.launch() if __name__ == "__main__": main()