tiny-scribe / DEPLOY.md
Luigi's picture
Test: Verify git workflow preserves commit messages
fc5ac33

HuggingFace Spaces Deployment Guide

Quick Start

1. Create Space on HuggingFace

  1. Go to huggingface.co/spaces
  2. Click "Create new Space"
  3. Select:
    • Space name: tiny-scribe (or your preferred name)
    • SDK: Docker
    • Space hardware: CPU (Free Tier - 2 vCPUs)
  4. Click "Create Space"

2. Upload Files

Upload these files to your Space:

  • app.py - Main Gradio application
  • Dockerfile - Container configuration
  • requirements.txt - Python dependencies
  • README.md - Space documentation
  • transcripts/ - Example files (optional)

Using Git:

git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push

IMPORTANT: Always use git push - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".

3. Wait for Build

The Space will automatically:

  1. Build the Docker container (~2-5 minutes)
  2. Install dependencies (llama-cpp-python wheel is prebuilt)
  3. Start the Gradio app

4. Access Your App

Once built, visit: https://your-username-tiny-scribe.hf.space

Configuration

Model Selection

The default model (unsloth/Qwen3-0.6B-GGUF Q4_K_M) is optimized for CPU:

  • Small: 0.6B parameters
  • Fast: ~2-5 seconds for short texts
  • Efficient: Uses ~400MB RAM

To change models, edit app.py:

DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF"  # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf"  # Lower quantization for speed

Performance Tuning

For Free Tier (2 vCPUs):

  • Keep n_ctx=4096 (context window)
  • Use max_tokens=512 (output length)
  • Set temperature=0.6 (balance creativity/coherence)

Environment Variables

Optional settings in Space Settings:

MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6

Features

  1. File Upload: Drag & drop .txt files
  2. Live Streaming: Real-time token output
  3. Traditional Chinese: Auto-conversion to zh-TW
  4. Progressive Loading: Model downloads on first use (~30-60s)
  5. Responsive UI: Works on mobile and desktop

Troubleshooting

Build Fails

  • Check Docker Hub status
  • Verify requirements.txt syntax
  • Ensure no large files in repo

Out of Memory

  • Reduce n_ctx (context window)
  • Use smaller model (Q2_K quantization)
  • Limit input file size

Slow Inference

  • Normal for CPU-only Free Tier
  • First request downloads model (~400MB)
  • Subsequent requests are faster

Architecture

User Upload β†’ Gradio Interface β†’ app.py β†’ llama-cpp-python β†’ Qwen Model
                                    ↓
                              OpenCC (s2twp)
                                    ↓
                         Streaming Output β†’ User

Deployment Workflow

Recommended: Use the Deployment Script

The deploy.sh script ensures meaningful commit messages:

# Make your changes
vim app.py

# Test locally
python app.py

# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"

The script will:

  1. Check for uncommitted changes
  2. Prompt for commit message if not provided
  3. Warn about generic/short messages
  4. Show commits to be pushed
  5. Confirm before pushing
  6. Verify commit message was preserved on remote

Manual Deployment

If deploying manually:

# 1. Make changes
vim app.py

# 2. Test locally
python app.py

# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting

- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"

# 4. Push to HuggingFace Spaces
git push origin main

# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe

Avoiding Generic Commit Messages

❌ DON'T:

  • Edit files directly on huggingface.co
  • Use the "Upload files" button in HF web UI
  • Use single-word commit messages ("fix", "update")

βœ… DO:

  • Always use git push from command line
  • Write descriptive commit messages
  • Test locally before pushing

Git Hook

A pre-push hook is installed in .git/hooks/pre-push that:

  • Validates commit messages before pushing
  • Warns about very short messages
  • Ensures you're not accidentally pushing generic commits

Local Testing

Before deploying to HF Spaces:

pip install -r requirements.txt
python app.py

Then open: http://localhost:7860

License

MIT - See LICENSE file for details.