nano-coder-zerogpu / ZEROGPU_SETUP.md
mlopez6132's picture
Upload ZEROGPU_SETUP.md with huggingface_hub
db010c6 verified
# πŸš€ ZeroGPU Setup Guide: Free H200 Training
## 🎯 What is ZeroGPU?
**ZeroGPU** is Hugging Face's **FREE** compute service that provides:
- **Nvidia H200 GPU** (70GB memory)
- **No time limits** (unlike the 4-minute daily limit)
- **No credit card required**
- **Perfect for training** nanoGPT models
## πŸ“Š ZeroGPU vs Previous Approach
| Feature | Previous (HF Spaces) | ZeroGPU |
|---------|---------------------|---------|
| **GPU** | H200 (4 min/day) | H200 (unlimited) |
| **Memory** | Limited | 70GB |
| **Time** | 4 minutes daily | No limits |
| **Cost** | Free | Free |
| **Use Case** | Demos/Testing | Real Training |
## πŸš€ How to Use ZeroGPU
### Option 1: Hugging Face Training Cluster (Recommended)
1. **Create HF Model Repository:**
```bash
huggingface-cli repo create nano-coder-zerogpu --type model
```
2. **Upload Training Files:**
```bash
python upload_to_zerogpu.py
```
3. **Launch ZeroGPU Training:**
```bash
python launch_zerogpu.py
```
### Option 2: Direct ZeroGPU API
1. **Install HF Hub:**
```bash
pip install huggingface_hub
```
2. **Set HF Token:**
```bash
export HF_TOKEN="your_token_here"
```
3. **Run ZeroGPU Training:**
```bash
python zerogpu_training.py
```
## πŸ“ Files for ZeroGPU
- `zerogpu_training.py` - Main training script
- `upload_to_zerogpu.py` - Upload files to HF
- `launch_zerogpu.py` - Launch training job
- `ZEROGPU_SETUP.md` - This guide
## βš™οΈ ZeroGPU Configuration
### Model Settings (Full Power!)
- **Layers**: 12 (full model)
- **Heads**: 12 (full model)
- **Embedding**: 768 (full model)
- **Context**: 1024 tokens
- **Parameters**: ~124M (full GPT-2 size)
### Training Settings
- **Batch Size**: 48 (optimized for H200)
- **Learning Rate**: 6e-4 (standard GPT-2)
- **Iterations**: 10,000 (no time limits!)
- **Checkpoints**: Every 1000 iterations
## 🎯 Expected Results
With ZeroGPU H200 (no time limits):
- **Training Time**: 2-4 hours
- **Final Loss**: ~1.8-2.2
- **Model Quality**: Production-ready
- **Code Generation**: High quality Python code
## πŸ”§ Setup Steps
### Step 1: Create HF Repository
```bash
huggingface-cli repo create nano-coder-zerogpu --type model
```
### Step 2: Prepare Dataset
```bash
python prepare_code_dataset.py
```
### Step 3: Launch Training
```bash
python zerogpu_training.py
```
## πŸ“Š Monitoring
### Wandb Dashboard
- Real-time training metrics
- Loss curves
- Model performance
### HF Hub
- Automatic checkpoint uploads
- Model versioning
- Training logs
## πŸ’° Cost: **$0** (Completely Free!)
- **No credit card required**
- **No time limits**
- **H200 GPU access**
- **70GB memory**
## πŸŽ‰ Benefits of ZeroGPU
1. **No Time Limits** - Train for hours, not minutes
2. **Full Model** - Use complete GPT-2 architecture
3. **Better Results** - Production-quality models
4. **Real Training** - Not just demos
5. **Automatic Saving** - Models saved to HF Hub
## 🚨 Troubleshooting
### If Training Won't Start
1. Check HF token is set
2. Verify repository exists
3. Check dataset is prepared
### If Out of Memory
1. Reduce batch_size to 32
2. Reduce gradient_accumulation_steps
3. Use smaller model (but why?)
### If Upload Fails
1. Check internet connection
2. Verify HF token permissions
3. Check repository access
## 🎯 Use Cases
### Perfect For:
- βœ… **Production Training** - Real model training
- βœ… **Research** - Experiment with different configs
- βœ… **Learning** - Understand full training process
- βœ… **Model Sharing** - Upload to HF Hub
### Not Suitable For:
- ❌ **Quick Demos** - Use HF Spaces for that
- ❌ **Testing** - Use local GPU for that
## πŸ”„ Workflow
1. **Setup**: Create HF repo and prepare data
2. **Train**: Launch ZeroGPU training
3. **Monitor**: Watch progress on Wandb
4. **Save**: Models automatically uploaded
5. **Share**: Use trained models
## πŸ“ˆ Performance
Expected training performance on ZeroGPU H200:
- **Iterations/second**: ~2-3
- **Memory usage**: ~40-50GB
- **Training time**: 2-4 hours for 10k iterations
- **Final model**: Production quality
## πŸŽ‰ Success!
ZeroGPU is the **proper way** to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!
**Next Steps:**
1. Create HF repository
2. Upload files
3. Launch training
4. Monitor progress
5. Use your trained model!
Happy ZeroGPU training! πŸš€