π HuggingFace Spaces Deployment Guide
π Quick Setup for Gemma
Step 1: Create Your HuggingFace Space
- Go to HuggingFace Spaces
- Click "Create new Space"
- Choose:
- Space name:
your-username/sema-chat-gemma - License: MIT
- Space SDK: Docker
- Space hardware: CPU basic (free) or T4 small (paid)
- Space name:
Step 2: Clone and Upload Files
# Clone your new space
git clone https://huggingface.co/spaces/your-username/sema-chat-gemma
cd sema-chat-gemma
# Copy all files from backend/sema-chat/
cp -r /path/to/sema/backend/sema-chat/* .
# Add and commit
git add .
git commit -m "Initial Sema Chat API with Gemma support"
git push
Step 3: Configure Environment Variables
In your Space settings, add these environment variables:
Option A: Local Gemma (Free Tier)
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=cpu
TEMPERATURE=0.7
MAX_NEW_TOKENS=256
DEBUG=false
ENVIRONMENT=production
Option B: Gemma via Google AI Studio (Recommended)
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key_here
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
DEBUG=false
ENVIRONMENT=production
Option C: Gemma via HuggingFace API
MODEL_TYPE=hf_api
MODEL_NAME=google/gemma-2b-it
HF_API_TOKEN=your_hf_token_here
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
DEBUG=false
ENVIRONMENT=production
π Getting API Keys
Google AI Studio API Key (Recommended)
- Go to Google AI Studio
- Sign in with your Google account
- Click "Get API Key"
- Create a new API key
- Copy the key and add it to your Space settings
HuggingFace API Token (Alternative)
- Go to HuggingFace Settings
- Click "New token"
- Choose "Read" access
- Copy the token and add it to your Space settings
π Required Files Structure
Make sure your Space has these files:
your-space/
βββ app/ # Main application code
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ README.md # Space documentation
βββ .gitignore # Git ignore file
π³ Dockerfile Configuration
Your Dockerfile should be:
FROM python:3.11-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PYTHONPATH="/app"
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create non-root user
RUN useradd -m -u 1000 user
USER user
# Expose port 7860 (HuggingFace Spaces standard)
EXPOSE 7860
# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD curl -f http://localhost:7860/health || exit 1
# Start the application
CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
π― Recommended Configuration for First Version
For your first deployment, I recommend Google AI Studio with Gemma:
Environment Variables:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_api_key_here
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
DEBUG=false
ENVIRONMENT=production
ENABLE_STREAMING=true
RATE_LIMIT=30
SESSION_TIMEOUT=30
Why This Setup?
- β Fast deployment - No model download needed
- β Reliable - Google's infrastructure
- β Cost-effective - Free tier available
- β Good performance - Gemma 2 9B is capable
- β Streaming support - Real-time responses
π§ͺ Testing Your Deployment
1. Check Health
curl https://your-username-sema-chat-gemma.hf.space/health
2. Test Chat
curl -X POST "https://your-username-sema-chat-gemma.hf.space/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello! Can you introduce yourself?",
"session_id": "test-session"
}'
3. Test Streaming
curl -N -H "Accept: text/event-stream" \
"https://your-username-sema-chat-gemma.hf.space/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
4. Access Swagger UI
Visit: https://your-username-sema-chat-gemma.hf.space/
π§ Troubleshooting
Common Issues:
1. Space Won't Start
- Check logs in Space settings
- Verify all required files are present
- Check Dockerfile syntax
2. Model Loading Fails
- Verify API key is correct
- Check model name spelling
- Try a smaller model first
3. API Errors
- Check environment variables
- Verify network connectivity
- Review application logs
4. Slow Responses
- Use smaller model (gemma-2-2b-it)
- Reduce MAX_NEW_TOKENS
- Enable streaming for better UX
Debug Commands:
# Check environment variables
curl https://your-space.hf.space/api/v1/model/info
# Check detailed health
curl https://your-space.hf.space/api/v1/health
# View logs in Space settings
π Step-by-Step Deployment
1. Prepare Your Space
# Create and clone your space
git clone https://huggingface.co/spaces/your-username/sema-chat-gemma
cd sema-chat-gemma
# Copy files
cp -r ../sema/backend/sema-chat/* .
2. Set Environment Variables
Go to your Space settings and add:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_key_here
3. Deploy
git add .
git commit -m "Deploy Sema Chat with Gemma"
git push
4. Wait for Build
- Space will automatically build (5-10 minutes)
- Check build logs for any errors
- Once running, test the endpoints
5. Share Your Space
Your API will be available at:
https://your-username-sema-chat-gemma.hf.space/
π‘ Pro Tips
- Start with Google AI Studio - Easiest setup
- Use environment variables - Never hardcode API keys
- Enable streaming - Better user experience
- Monitor usage - Check API quotas
- Test thoroughly - Use the provided test scripts
- Document your API - Swagger UI is auto-generated
π You're Ready!
With this setup, you'll have a production-ready chatbot API with:
- β Gemma 2 9B model via Google AI Studio
- β Streaming responses
- β Session management
- β Rate limiting
- β Health monitoring
- β Interactive Swagger UI
Your Space URL will be:
https://your-username-sema-chat-gemma.hf.space/
Happy deploying! π