danielrosehill's picture
commit
279efce
metadata
description: Set up conda environment for speech-to-text fine-tuning
tags:
  - python
  - conda
  - stt
  - whisper
  - speech
  - ai
  - fine-tuning
  - project
  - gitignored

You are helping the user set up a conda environment for speech-to-text (STT) fine-tuning.

Process

  1. Create base environment

    conda create -n stt-finetune python=3.11 -y
    conda activate stt-finetune
    
  2. Install PyTorch with ROCm

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
    
  3. Install Whisper and related libraries

    pip install openai-whisper
    pip install faster-whisper  # Optimized inference
    pip install whisperx        # Advanced features
    
  4. Install Hugging Face libraries

    pip install transformers
    pip install datasets
    pip install accelerate
    pip install evaluate
    pip install peft           # For LoRA fine-tuning
    
  5. Install audio processing libraries

    pip install librosa         # Audio analysis
    pip install soundfile       # Audio I/O
    pip install pydub           # Audio manipulation
    pip install sox             # Audio processing
    conda install -c conda-forge ffmpeg -y  # Audio conversion
    
  6. Install speech-specific tools

    pip install jiwer          # Word Error Rate calculation
    pip install speechbrain    # Speech toolkit
    pip install pyannote.audio # Speaker diarization
    
  7. Install data processing tools

    pip install pandas
    pip install numpy
    pip install scipy
    pip install matplotlib
    pip install seaborn        # Visualization
    
  8. Install monitoring and experimentation

    pip install wandb          # Experiment tracking
    pip install tensorboard
    
  9. Install Jupyter for interactive work

    conda install -c conda-forge jupyter jupyterlab ipywidgets -y
    
  10. Test installation

import torch
import whisper
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration

print(f"PyTorch: {torch.__version__}")
print(f"GPU available: {torch.cuda.is_available()}")
print("All libraries imported successfully!")
  1. Suggest common datasets
  • Common Voice (Mozilla)
  • LibriSpeech
  • TEDLIUM
  • Custom datasets
  1. Create example script
  • Offer to create ~/scripts/whisper-finetune-example.py with basic setup

Output

Provide a summary showing:

  • Environment name and setup status
  • Installed libraries grouped by purpose
  • GPU detection status
  • Available VRAM for training
  • Suggested datasets for fine-tuning
  • Example commands for testing
  • Links to documentation/tutorials