Spaces:

mrmarkhf
/

raynos-ai-cpu

Sleeping

App Files Files Community

raynos-ai-cpu / README.md

mrmarkhf

Force rebuild: Update metadata to trigger Space rebuild

704e5b5 3 months ago

preview code

raw

history blame contribute delete

3.64 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: Raynos AI Audio Transcription
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
license: apache-2.0
models:
  - openai/whisper-base
  - google/gemma-2b-it

Raynos AI - Real-time Audio Transcription & JSON Extraction

🎯 Overview

Raynos AI is an advanced audio transcription application that combines OpenAI's Whisper model with Google's Gemma model to provide:

Real-time Audio Transcription: Convert speech to text using state-of-the-art Whisper models
Structured JSON Extraction: Automatically extract key information (names, locations, dates, events) from transcriptions
Multiple Input Methods: Support for microphone recording, file upload, and streaming
Flexible Transcription Engines: Choose between local Whisper or cloud-based Deepgram

🚀 Features

Audio Processing

🎤 Live Microphone Recording: Real-time audio capture and transcription
📁 File Upload: Process pre-recorded audio files (MP3, WAV, AAC, etc.)
🔄 Streaming Mode: Continuous transcription for long recordings
📱 Mobile Support: Optimized for mobile device audio input

Transcription Options

Whisper Models: Choose from tiny, base, small, medium, or large models
Deepgram Integration: Optional cloud-based transcription (requires API key)
Language Support: Auto-detect or specify language
Buffer Control: Adjustable buffer duration for optimal performance

JSON Extraction

Smart Information Extraction: Automatically identifies and structures:
- Person names
- Locations (cities, countries, addresses)
- Dates and times
- Events and activities
- Key topics and themes
Temporal Context: Links extracted information to timestamps

🛠️ Configuration

Environment Variables (Optional)

DEEPGRAM_API_KEY: Enable Deepgram cloud transcription
CUDA_VISIBLE_DEVICES: Control GPU usage

Model Selection

The app automatically selects appropriate models based on available hardware:

GPU Available: Uses larger, more accurate models
CPU Only: Falls back to smaller, faster models

📊 Technical Details

Models Used

Transcription: OpenAI Whisper (various sizes)
Extraction: Google Gemma-2B-IT (optional, for JSON extraction)

Audio Processing

Sample Rate: 16kHz
Format: Mono channel
Chunk Size: 1024 samples

🎮 Usage

Select Input Method:
- Desktop: Use microphone or upload file
- Mobile: Use mobile audio streaming
Configure Settings:
- Choose transcription engine (Whisper/Deepgram)
- Select model size (accuracy vs speed trade-off)
- Set language (auto-detect or specific)
Start Transcription:
- Click "Start Streaming" for live audio
- Or "Process File" for uploaded audio
View Results:
- Real-time transcription display
- Structured JSON output with extracted information

📝 Notes

First run may take time to download models
GPU recommended for best performance
Larger models provide better accuracy but require more resources

🤝 Contributing

This is an open-source project. Contributions are welcome!

📄 License

Apache License 2.0

Built with ❤️ using Gradio and Hugging Face