Spaces:

Luigi
/

tiny-scribe

Running

App Files Files Community

tiny-scribe / DEPLOY.md

Luigi

Test: Verify git workflow preserves commit messages

fc5ac33 about 1 month ago

preview code

raw

history blame contribute delete

4.62 kB

HuggingFace Spaces Deployment Guide

Quick Start

1. Create Space on HuggingFace

Go to huggingface.co/spaces
Click "Create new Space"
Select:
- Space name: tiny-scribe (or your preferred name)
- SDK: Docker
- Space hardware: CPU (Free Tier - 2 vCPUs)
Click "Create Space"

2. Upload Files

Upload these files to your Space:

app.py - Main Gradio application
Dockerfile - Container configuration
requirements.txt - Python dependencies
README.md - Space documentation
transcripts/ - Example files (optional)

Using Git:

git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push

IMPORTANT: Always use git push - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".

3. Wait for Build

The Space will automatically:

Build the Docker container (~2-5 minutes)
Install dependencies (llama-cpp-python wheel is prebuilt)
Start the Gradio app

4. Access Your App

Once built, visit: https://your-username-tiny-scribe.hf.space

Configuration

Model Selection

The default model (unsloth/Qwen3-0.6B-GGUF Q4_K_M) is optimized for CPU:

Small: 0.6B parameters
Fast: ~2-5 seconds for short texts
Efficient: Uses ~400MB RAM

To change models, edit app.py:

DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF"  # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf"  # Lower quantization for speed

Performance Tuning

For Free Tier (2 vCPUs):

Keep n_ctx=4096 (context window)
Use max_tokens=512 (output length)
Set temperature=0.6 (balance creativity/coherence)

Environment Variables

Optional settings in Space Settings:

MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6

Features

File Upload: Drag & drop .txt files
Live Streaming: Real-time token output
Traditional Chinese: Auto-conversion to zh-TW
Progressive Loading: Model downloads on first use (~30-60s)
Responsive UI: Works on mobile and desktop

Troubleshooting

Build Fails

Check Docker Hub status
Verify requirements.txt syntax
Ensure no large files in repo

Out of Memory

Reduce n_ctx (context window)
Use smaller model (Q2_K quantization)
Limit input file size

Slow Inference

Normal for CPU-only Free Tier
First request downloads model (~400MB)
Subsequent requests are faster

Architecture

User Upload → Gradio Interface → app.py → llama-cpp-python → Qwen Model
                                    ↓
                              OpenCC (s2twp)
                                    ↓
                         Streaming Output → User

Deployment Workflow

Recommended: Use the Deployment Script

The deploy.sh script ensures meaningful commit messages:

# Make your changes
vim app.py

# Test locally
python app.py

# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"

The script will:

Check for uncommitted changes
Prompt for commit message if not provided
Warn about generic/short messages
Show commits to be pushed
Confirm before pushing
Verify commit message was preserved on remote

Manual Deployment

If deploying manually:

# 1. Make changes
vim app.py

# 2. Test locally
python app.py

# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting

- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"

# 4. Push to HuggingFace Spaces
git push origin main

# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe

Avoiding Generic Commit Messages

❌ DON'T:

Edit files directly on huggingface.co
Use the "Upload files" button in HF web UI
Use single-word commit messages ("fix", "update")

✅ DO:

Always use git push from command line
Write descriptive commit messages
Test locally before pushing

Git Hook

A pre-push hook is installed in .git/hooks/pre-push that:

Validates commit messages before pushing
Warns about very short messages
Ensures you're not accidentally pushing generic commits

Local Testing

Before deploying to HF Spaces:

pip install -r requirements.txt
python app.py

Then open: http://localhost:7860

License

MIT - See LICENSE file for details.