Spaces:
Sleeping
Sleeping
File size: 3,639 Bytes
46706d4 2d7c464 46706d4 704e5b5 46706d4 986534f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
title: Raynos AI Audio Transcription
emoji: ๐๏ธ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
# Last updated: Security fixes and CPU optimization
license: apache-2.0
models:
- openai/whisper-base
- google/gemma-2b-it
---
# Raynos AI - Real-time Audio Transcription & JSON Extraction
<div align="center">
<img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python">
<img src="https://img.shields.io/badge/Gradio-4.12.0-orange.svg" alt="Gradio">
<img src="https://img.shields.io/badge/License-Apache%202.0-green.svg" alt="License">
</div>
## ๐ฏ Overview
Raynos AI is an advanced audio transcription application that combines OpenAI's Whisper model with Google's Gemma model to provide:
- **Real-time Audio Transcription**: Convert speech to text using state-of-the-art Whisper models
- **Structured JSON Extraction**: Automatically extract key information (names, locations, dates, events) from transcriptions
- **Multiple Input Methods**: Support for microphone recording, file upload, and streaming
- **Flexible Transcription Engines**: Choose between local Whisper or cloud-based Deepgram
## ๐ Features
### Audio Processing
- ๐ค **Live Microphone Recording**: Real-time audio capture and transcription
- ๐ **File Upload**: Process pre-recorded audio files (MP3, WAV, AAC, etc.)
- ๐ **Streaming Mode**: Continuous transcription for long recordings
- ๐ฑ **Mobile Support**: Optimized for mobile device audio input
### Transcription Options
- **Whisper Models**: Choose from tiny, base, small, medium, or large models
- **Deepgram Integration**: Optional cloud-based transcription (requires API key)
- **Language Support**: Auto-detect or specify language
- **Buffer Control**: Adjustable buffer duration for optimal performance
### JSON Extraction
- **Smart Information Extraction**: Automatically identifies and structures:
- Person names
- Locations (cities, countries, addresses)
- Dates and times
- Events and activities
- Key topics and themes
- **Temporal Context**: Links extracted information to timestamps
## ๐ ๏ธ Configuration
### Environment Variables (Optional)
- `DEEPGRAM_API_KEY`: Enable Deepgram cloud transcription
- `CUDA_VISIBLE_DEVICES`: Control GPU usage
### Model Selection
The app automatically selects appropriate models based on available hardware:
- **GPU Available**: Uses larger, more accurate models
- **CPU Only**: Falls back to smaller, faster models
## ๐ Technical Details
### Models Used
- **Transcription**: OpenAI Whisper (various sizes)
- **Extraction**: Google Gemma-2B-IT (optional, for JSON extraction)
### Audio Processing
- Sample Rate: 16kHz
- Format: Mono channel
- Chunk Size: 1024 samples
## ๐ฎ Usage
1. **Select Input Method**:
- Desktop: Use microphone or upload file
- Mobile: Use mobile audio streaming
2. **Configure Settings**:
- Choose transcription engine (Whisper/Deepgram)
- Select model size (accuracy vs speed trade-off)
- Set language (auto-detect or specific)
3. **Start Transcription**:
- Click "Start Streaming" for live audio
- Or "Process File" for uploaded audio
4. **View Results**:
- Real-time transcription display
- Structured JSON output with extracted information
## ๐ Notes
- First run may take time to download models
- GPU recommended for best performance
- Larger models provide better accuracy but require more resources
## ๐ค Contributing
This is an open-source project. Contributions are welcome!
## ๐ License
Apache License 2.0
---
**Built with โค๏ธ using Gradio and Hugging Face**
|