π ZeroGPU Setup Guide: Free H200 Training
π― What is ZeroGPU?
ZeroGPU is Hugging Face's FREE compute service that provides:
- Nvidia H200 GPU (70GB memory)
- No time limits (unlike the 4-minute daily limit)
- No credit card required
- Perfect for training nanoGPT models
π ZeroGPU vs Previous Approach
Feature | Previous (HF Spaces) | ZeroGPU |
---|---|---|
GPU | H200 (4 min/day) | H200 (unlimited) |
Memory | Limited | 70GB |
Time | 4 minutes daily | No limits |
Cost | Free | Free |
Use Case | Demos/Testing | Real Training |
π How to Use ZeroGPU
Option 1: Hugging Face Training Cluster (Recommended)
Create HF Model Repository:
huggingface-cli repo create nano-coder-zerogpu --type model
Upload Training Files:
python upload_to_zerogpu.py
Launch ZeroGPU Training:
python launch_zerogpu.py
Option 2: Direct ZeroGPU API
Install HF Hub:
pip install huggingface_hub
Set HF Token:
export HF_TOKEN="your_token_here"
Run ZeroGPU Training:
python zerogpu_training.py
π Files for ZeroGPU
zerogpu_training.py
- Main training scriptupload_to_zerogpu.py
- Upload files to HFlaunch_zerogpu.py
- Launch training jobZEROGPU_SETUP.md
- This guide
βοΈ ZeroGPU Configuration
Model Settings (Full Power!)
- Layers: 12 (full model)
- Heads: 12 (full model)
- Embedding: 768 (full model)
- Context: 1024 tokens
- Parameters: ~124M (full GPT-2 size)
Training Settings
- Batch Size: 48 (optimized for H200)
- Learning Rate: 6e-4 (standard GPT-2)
- Iterations: 10,000 (no time limits!)
- Checkpoints: Every 1000 iterations
π― Expected Results
With ZeroGPU H200 (no time limits):
- Training Time: 2-4 hours
- Final Loss: ~1.8-2.2
- Model Quality: Production-ready
- Code Generation: High quality Python code
π§ Setup Steps
Step 1: Create HF Repository
huggingface-cli repo create nano-coder-zerogpu --type model
Step 2: Prepare Dataset
python prepare_code_dataset.py
Step 3: Launch Training
python zerogpu_training.py
π Monitoring
Wandb Dashboard
- Real-time training metrics
- Loss curves
- Model performance
HF Hub
- Automatic checkpoint uploads
- Model versioning
- Training logs
π° Cost: $0 (Completely Free!)
- No credit card required
- No time limits
- H200 GPU access
- 70GB memory
π Benefits of ZeroGPU
- No Time Limits - Train for hours, not minutes
- Full Model - Use complete GPT-2 architecture
- Better Results - Production-quality models
- Real Training - Not just demos
- Automatic Saving - Models saved to HF Hub
π¨ Troubleshooting
If Training Won't Start
- Check HF token is set
- Verify repository exists
- Check dataset is prepared
If Out of Memory
- Reduce batch_size to 32
- Reduce gradient_accumulation_steps
- Use smaller model (but why?)
If Upload Fails
- Check internet connection
- Verify HF token permissions
- Check repository access
π― Use Cases
Perfect For:
- β Production Training - Real model training
- β Research - Experiment with different configs
- β Learning - Understand full training process
- β Model Sharing - Upload to HF Hub
Not Suitable For:
- β Quick Demos - Use HF Spaces for that
- β Testing - Use local GPU for that
π Workflow
- Setup: Create HF repo and prepare data
- Train: Launch ZeroGPU training
- Monitor: Watch progress on Wandb
- Save: Models automatically uploaded
- Share: Use trained models
π Performance
Expected training performance on ZeroGPU H200:
- Iterations/second: ~2-3
- Memory usage: ~40-50GB
- Training time: 2-4 hours for 10k iterations
- Final model: Production quality
π Success!
ZeroGPU is the proper way to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!
Next Steps:
- Create HF repository
- Upload files
- Launch training
- Monitor progress
- Use your trained model!
Happy ZeroGPU training! π