---
description: Check installed STT apps and suggest installations including local Whisper
tags: [ai, stt, whisper, speech-recognition, audio, project, gitignored]
---

You are helping the user set up speech-to-text applications including local Whisper.

## Process

1. **Check currently installed STT apps**
   - System packages: `dpkg -l | grep -E "whisper|speech|voice"`
   - Python packages: `pip list | grep -E "whisper|speech|vosk"`
   - Check `~/programs/ai-ml/` for installed apps

2. **Suggest STT installation candidates**

   **Whisper (OpenAI) - Recommended:**
   - Best quality, local inference
   - Multiple model sizes available
   - Multilingual support

   **Other options:**
   - Vosk - Lightweight, offline
   - Coqui STT - Mozilla's solution
   - SpeechNote - Simple GUI
   - Subtitle Edit - Video subtitling
   - Subtld - Automatic subtitles

3. **Install Whisper (local)**

   **Method 1: Using pip (simple)**
   ```bash
   pip install openai-whisper
   ```

   **Method 2: Using conda (recommended)**
   ```bash
   conda create -n whisper python=3.11 -y
   conda activate whisper
   pip install openai-whisper
   ```

   **Install dependencies:**
   ```bash
   # For audio processing
   sudo apt install ffmpeg
   pip install setuptools-rust
   ```

4. **Install faster-whisper (optimized)**
   ```bash
   pip install faster-whisper
   ```
   - Uses CTranslate2 for faster inference
   - Lower VRAM usage

5. **Install WhisperX (advanced)**
   ```bash
   pip install whisperx
   ```
   - Includes alignment and diarization
   - Better timestamps

6. **Download Whisper models**
   - Models are downloaded automatically on first use
   - Sizes: tiny, base, small, medium, large
   - Suggest based on VRAM:
     - < 4GB: tiny or base
     - 4-8GB: small or medium
     - 8GB+: large

7. **Test installation**
   ```bash
   whisper audio.mp3 --model base --language en
   ```

8. **Install GUI options**

   **Whisper Desktop:**
   - Check if available as AppImage or Flatpak

   **Subtitle Edit:**
   ```bash
   sudo apt install subtitleeditor
   ```

   **Custom GUI:**
   - Suggest installing gradio-based Whisper UIs

9. **Create helper script**
   - Offer to create `~/scripts/transcribe.sh`:
     ```bash
     #!/bin/bash
     whisper "$1" --model medium --language en --output_format txt
     ```

10. **Suggest workflows**
   - Real-time transcription
   - Batch processing
   - Video subtitling
   - Meeting transcription

## Output

Provide a summary showing:
- Currently installed STT applications
- Whisper installation status and model sizes
- GPU acceleration status
- Suggested models based on hardware
- Example commands for transcription
- GUI options available
- Helper scripts created