nano-coder-zerogpu / ZEROGPU_SETUP.md

Upload ZEROGPU_SETUP.md with huggingface_hub

db010c6 verified 2 months ago

4.41 kB

	# 🚀 ZeroGPU Setup Guide: Free H200 Training

	## 🎯 What is ZeroGPU?

	ZeroGPU is Hugging Face's FREE compute service that provides:
	- Nvidia H200 GPU (70GB memory)
	- No time limits (unlike the 4-minute daily limit)
	- No credit card required
	- Perfect for training nanoGPT models

	## 📊 ZeroGPU vs Previous Approach

	\| Feature \| Previous (HF Spaces) \| ZeroGPU \|
	\|---------\|---------------------\|---------\|
	\| GPU \| H200 (4 min/day) \| H200 (unlimited) \|
	\| Memory \| Limited \| 70GB \|
	\| Time \| 4 minutes daily \| No limits \|
	\| Cost \| Free \| Free \|
	\| Use Case \| Demos/Testing \| Real Training \|

	## 🚀 How to Use ZeroGPU

	### Option 1: Hugging Face Training Cluster (Recommended)

	1. Create HF Model Repository:
	```bash
	huggingface-cli repo create nano-coder-zerogpu --type model
	```

	2. Upload Training Files:
	```bash
	python upload_to_zerogpu.py
	```

	3. Launch ZeroGPU Training:
	```bash
	python launch_zerogpu.py
	```

	### Option 2: Direct ZeroGPU API

	1. Install HF Hub:
	```bash
	pip install huggingface_hub
	```

	2. Set HF Token:
	```bash
	export HF_TOKEN="your_token_here"
	```

	3. Run ZeroGPU Training:
	```bash
	python zerogpu_training.py
	```

	## 📁 Files for ZeroGPU

	- `zerogpu_training.py` - Main training script
	- `upload_to_zerogpu.py` - Upload files to HF
	- `launch_zerogpu.py` - Launch training job
	- `ZEROGPU_SETUP.md` - This guide

	## ⚙️ ZeroGPU Configuration

	### Model Settings (Full Power!)
	- Layers: 12 (full model)
	- Heads: 12 (full model)
	- Embedding: 768 (full model)
	- Context: 1024 tokens
	- Parameters: ~124M (full GPT-2 size)

	### Training Settings
	- Batch Size: 48 (optimized for H200)
	- Learning Rate: 6e-4 (standard GPT-2)
	- Iterations: 10,000 (no time limits!)
	- Checkpoints: Every 1000 iterations

	## 🎯 Expected Results

	With ZeroGPU H200 (no time limits):
	- Training Time: 2-4 hours
	- Final Loss: ~1.8-2.2
	- Model Quality: Production-ready
	- Code Generation: High quality Python code

	## 🔧 Setup Steps

	### Step 1: Create HF Repository
	```bash
	huggingface-cli repo create nano-coder-zerogpu --type model
	```

	### Step 2: Prepare Dataset
	```bash
	python prepare_code_dataset.py
	```

	### Step 3: Launch Training
	```bash
	python zerogpu_training.py
	```

	## 📊 Monitoring

	### Wandb Dashboard
	- Real-time training metrics
	- Loss curves
	- Model performance

	### HF Hub
	- Automatic checkpoint uploads
	- Model versioning
	- Training logs

	## 💰 Cost: $0 (Completely Free!)

	- No credit card required
	- No time limits
	- H200 GPU access
	- 70GB memory

	## 🎉 Benefits of ZeroGPU

	1. No Time Limits - Train for hours, not minutes
	2. Full Model - Use complete GPT-2 architecture
	3. Better Results - Production-quality models
	4. Real Training - Not just demos
	5. Automatic Saving - Models saved to HF Hub

	## 🚨 Troubleshooting

	### If Training Won't Start
	1. Check HF token is set
	2. Verify repository exists
	3. Check dataset is prepared

	### If Out of Memory
	1. Reduce batch_size to 32
	2. Reduce gradient_accumulation_steps
	3. Use smaller model (but why?)

	### If Upload Fails
	1. Check internet connection
	2. Verify HF token permissions
	3. Check repository access

	## 🎯 Use Cases

	### Perfect For:
	- ✅ Production Training - Real model training
	- ✅ Research - Experiment with different configs
	- ✅ Learning - Understand full training process
	- ✅ Model Sharing - Upload to HF Hub

	### Not Suitable For:
	- ❌ Quick Demos - Use HF Spaces for that
	- ❌ Testing - Use local GPU for that

	## 🔄 Workflow

	1. Setup: Create HF repo and prepare data
	2. Train: Launch ZeroGPU training
	3. Monitor: Watch progress on Wandb
	4. Save: Models automatically uploaded
	5. Share: Use trained models

	## 📈 Performance

	Expected training performance on ZeroGPU H200:
	- Iterations/second: ~2-3
	- Memory usage: ~40-50GB
	- Training time: 2-4 hours for 10k iterations
	- Final model: Production quality

	## 🎉 Success!

	ZeroGPU is the proper way to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!

	Next Steps:
	1. Create HF repository
	2. Upload files
	3. Launch training
	4. Monitor progress
	5. Use your trained model!

	Happy ZeroGPU training! 🚀