File size: 3,639 Bytes
46706d4
 
 
 
 
 
2d7c464
46706d4
 
704e5b5
46706d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
986534f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: Raynos AI Audio Transcription
emoji: ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: false
# Last updated: Security fixes and CPU optimization
license: apache-2.0
models:
  - openai/whisper-base
  - google/gemma-2b-it
---

# Raynos AI - Real-time Audio Transcription & JSON Extraction

<div align="center">
  <img src="https://img.shields.io/badge/Python-3.10+-blue.svg" alt="Python">
  <img src="https://img.shields.io/badge/Gradio-4.12.0-orange.svg" alt="Gradio">
  <img src="https://img.shields.io/badge/License-Apache%202.0-green.svg" alt="License">
</div>

## ๐ŸŽฏ Overview

Raynos AI is an advanced audio transcription application that combines OpenAI's Whisper model with Google's Gemma model to provide:

- **Real-time Audio Transcription**: Convert speech to text using state-of-the-art Whisper models
- **Structured JSON Extraction**: Automatically extract key information (names, locations, dates, events) from transcriptions
- **Multiple Input Methods**: Support for microphone recording, file upload, and streaming
- **Flexible Transcription Engines**: Choose between local Whisper or cloud-based Deepgram

## ๐Ÿš€ Features

### Audio Processing
- ๐ŸŽค **Live Microphone Recording**: Real-time audio capture and transcription
- ๐Ÿ“ **File Upload**: Process pre-recorded audio files (MP3, WAV, AAC, etc.)
- ๐Ÿ”„ **Streaming Mode**: Continuous transcription for long recordings
- ๐Ÿ“ฑ **Mobile Support**: Optimized for mobile device audio input

### Transcription Options
- **Whisper Models**: Choose from tiny, base, small, medium, or large models
- **Deepgram Integration**: Optional cloud-based transcription (requires API key)
- **Language Support**: Auto-detect or specify language
- **Buffer Control**: Adjustable buffer duration for optimal performance

### JSON Extraction
- **Smart Information Extraction**: Automatically identifies and structures:
  - Person names
  - Locations (cities, countries, addresses)
  - Dates and times
  - Events and activities
  - Key topics and themes
- **Temporal Context**: Links extracted information to timestamps

## ๐Ÿ› ๏ธ Configuration

### Environment Variables (Optional)
- `DEEPGRAM_API_KEY`: Enable Deepgram cloud transcription
- `CUDA_VISIBLE_DEVICES`: Control GPU usage

### Model Selection
The app automatically selects appropriate models based on available hardware:
- **GPU Available**: Uses larger, more accurate models
- **CPU Only**: Falls back to smaller, faster models

## ๐Ÿ“Š Technical Details

### Models Used
- **Transcription**: OpenAI Whisper (various sizes)
- **Extraction**: Google Gemma-2B-IT (optional, for JSON extraction)

### Audio Processing
- Sample Rate: 16kHz
- Format: Mono channel
- Chunk Size: 1024 samples

## ๐ŸŽฎ Usage

1. **Select Input Method**:
   - Desktop: Use microphone or upload file
   - Mobile: Use mobile audio streaming

2. **Configure Settings**:
   - Choose transcription engine (Whisper/Deepgram)
   - Select model size (accuracy vs speed trade-off)
   - Set language (auto-detect or specific)

3. **Start Transcription**:
   - Click "Start Streaming" for live audio
   - Or "Process File" for uploaded audio

4. **View Results**:
   - Real-time transcription display
   - Structured JSON output with extracted information

## ๐Ÿ“ Notes

- First run may take time to download models
- GPU recommended for best performance
- Larger models provide better accuracy but require more resources

## ๐Ÿค Contributing

This is an open-source project. Contributions are welcome!

## ๐Ÿ“„ License

Apache License 2.0

---

**Built with โค๏ธ using Gradio and Hugging Face**