nano-coder-zerogpu / ZEROGPU_SETUP.md
mlopez6132's picture
Upload ZEROGPU_SETUP.md with huggingface_hub
db010c6 verified

πŸš€ ZeroGPU Setup Guide: Free H200 Training

🎯 What is ZeroGPU?

ZeroGPU is Hugging Face's FREE compute service that provides:

  • Nvidia H200 GPU (70GB memory)
  • No time limits (unlike the 4-minute daily limit)
  • No credit card required
  • Perfect for training nanoGPT models

πŸ“Š ZeroGPU vs Previous Approach

Feature Previous (HF Spaces) ZeroGPU
GPU H200 (4 min/day) H200 (unlimited)
Memory Limited 70GB
Time 4 minutes daily No limits
Cost Free Free
Use Case Demos/Testing Real Training

πŸš€ How to Use ZeroGPU

Option 1: Hugging Face Training Cluster (Recommended)

  1. Create HF Model Repository:

    huggingface-cli repo create nano-coder-zerogpu --type model
    
  2. Upload Training Files:

    python upload_to_zerogpu.py
    
  3. Launch ZeroGPU Training:

    python launch_zerogpu.py
    

Option 2: Direct ZeroGPU API

  1. Install HF Hub:

    pip install huggingface_hub
    
  2. Set HF Token:

    export HF_TOKEN="your_token_here"
    
  3. Run ZeroGPU Training:

    python zerogpu_training.py
    

πŸ“ Files for ZeroGPU

  • zerogpu_training.py - Main training script
  • upload_to_zerogpu.py - Upload files to HF
  • launch_zerogpu.py - Launch training job
  • ZEROGPU_SETUP.md - This guide

βš™οΈ ZeroGPU Configuration

Model Settings (Full Power!)

  • Layers: 12 (full model)
  • Heads: 12 (full model)
  • Embedding: 768 (full model)
  • Context: 1024 tokens
  • Parameters: ~124M (full GPT-2 size)

Training Settings

  • Batch Size: 48 (optimized for H200)
  • Learning Rate: 6e-4 (standard GPT-2)
  • Iterations: 10,000 (no time limits!)
  • Checkpoints: Every 1000 iterations

🎯 Expected Results

With ZeroGPU H200 (no time limits):

  • Training Time: 2-4 hours
  • Final Loss: ~1.8-2.2
  • Model Quality: Production-ready
  • Code Generation: High quality Python code

πŸ”§ Setup Steps

Step 1: Create HF Repository

huggingface-cli repo create nano-coder-zerogpu --type model

Step 2: Prepare Dataset

python prepare_code_dataset.py

Step 3: Launch Training

python zerogpu_training.py

πŸ“Š Monitoring

Wandb Dashboard

  • Real-time training metrics
  • Loss curves
  • Model performance

HF Hub

  • Automatic checkpoint uploads
  • Model versioning
  • Training logs

πŸ’° Cost: $0 (Completely Free!)

  • No credit card required
  • No time limits
  • H200 GPU access
  • 70GB memory

πŸŽ‰ Benefits of ZeroGPU

  1. No Time Limits - Train for hours, not minutes
  2. Full Model - Use complete GPT-2 architecture
  3. Better Results - Production-quality models
  4. Real Training - Not just demos
  5. Automatic Saving - Models saved to HF Hub

🚨 Troubleshooting

If Training Won't Start

  1. Check HF token is set
  2. Verify repository exists
  3. Check dataset is prepared

If Out of Memory

  1. Reduce batch_size to 32
  2. Reduce gradient_accumulation_steps
  3. Use smaller model (but why?)

If Upload Fails

  1. Check internet connection
  2. Verify HF token permissions
  3. Check repository access

🎯 Use Cases

Perfect For:

  • βœ… Production Training - Real model training
  • βœ… Research - Experiment with different configs
  • βœ… Learning - Understand full training process
  • βœ… Model Sharing - Upload to HF Hub

Not Suitable For:

  • ❌ Quick Demos - Use HF Spaces for that
  • ❌ Testing - Use local GPU for that

πŸ”„ Workflow

  1. Setup: Create HF repo and prepare data
  2. Train: Launch ZeroGPU training
  3. Monitor: Watch progress on Wandb
  4. Save: Models automatically uploaded
  5. Share: Use trained models

πŸ“ˆ Performance

Expected training performance on ZeroGPU H200:

  • Iterations/second: ~2-3
  • Memory usage: ~40-50GB
  • Training time: 2-4 hours for 10k iterations
  • Final model: Production quality

πŸŽ‰ Success!

ZeroGPU is the proper way to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!

Next Steps:

  1. Create HF repository
  2. Upload files
  3. Launch training
  4. Monitor progress
  5. Use your trained model!

Happy ZeroGPU training! πŸš€