nano-coder-zerogpu / ZEROGPU_SETUP.md

mlopez6132

Upload ZEROGPU_SETUP.md with huggingface_hub

db010c6 verified 2 months ago

preview code

raw

history blame contribute delete

4.41 kB

🚀 ZeroGPU Setup Guide: Free H200 Training

🎯 What is ZeroGPU?

ZeroGPU is Hugging Face's FREE compute service that provides:

Nvidia H200 GPU (70GB memory)
No time limits (unlike the 4-minute daily limit)
No credit card required
Perfect for training nanoGPT models

📊 ZeroGPU vs Previous Approach

Feature	Previous (HF Spaces)	ZeroGPU
GPU	H200 (4 min/day)	H200 (unlimited)
Memory	Limited	70GB
Time	4 minutes daily	No limits
Cost	Free	Free
Use Case	Demos/Testing	Real Training

🚀 How to Use ZeroGPU

Option 1: Hugging Face Training Cluster (Recommended)

Create HF Model Repository:

huggingface-cli repo create nano-coder-zerogpu --type model

Upload Training Files:
```
python upload_to_zerogpu.py
```
Launch ZeroGPU Training:
```
python launch_zerogpu.py
```

Option 2: Direct ZeroGPU API

Install HF Hub:
```
pip install huggingface_hub
```
Set HF Token:
```
export HF_TOKEN="your_token_here"
```
Run ZeroGPU Training:
```
python zerogpu_training.py
```

📁 Files for ZeroGPU

zerogpu_training.py - Main training script
upload_to_zerogpu.py - Upload files to HF
launch_zerogpu.py - Launch training job
ZEROGPU_SETUP.md - This guide

⚙️ ZeroGPU Configuration

Model Settings (Full Power!)

Layers: 12 (full model)
Heads: 12 (full model)
Embedding: 768 (full model)
Context: 1024 tokens
Parameters: ~124M (full GPT-2 size)

Training Settings

Batch Size: 48 (optimized for H200)
Learning Rate: 6e-4 (standard GPT-2)
Iterations: 10,000 (no time limits!)
Checkpoints: Every 1000 iterations

🎯 Expected Results

With ZeroGPU H200 (no time limits):

Training Time: 2-4 hours
Final Loss: ~1.8-2.2
Model Quality: Production-ready
Code Generation: High quality Python code

🔧 Setup Steps

Step 1: Create HF Repository

huggingface-cli repo create nano-coder-zerogpu --type model

Step 2: Prepare Dataset

python prepare_code_dataset.py

Step 3: Launch Training

python zerogpu_training.py

📊 Monitoring

Wandb Dashboard

Real-time training metrics
Loss curves
Model performance

HF Hub

Automatic checkpoint uploads
Model versioning
Training logs

💰 Cost: $0 (Completely Free!)

No credit card required
No time limits
H200 GPU access
70GB memory

🎉 Benefits of ZeroGPU

No Time Limits - Train for hours, not minutes
Full Model - Use complete GPT-2 architecture
Better Results - Production-quality models
Real Training - Not just demos
Automatic Saving - Models saved to HF Hub

🚨 Troubleshooting

If Training Won't Start

Check HF token is set
Verify repository exists
Check dataset is prepared

If Out of Memory

Reduce batch_size to 32
Reduce gradient_accumulation_steps
Use smaller model (but why?)

If Upload Fails

Check internet connection
Verify HF token permissions
Check repository access

🎯 Use Cases

Perfect For:

✅ Production Training - Real model training
✅ Research - Experiment with different configs
✅ Learning - Understand full training process
✅ Model Sharing - Upload to HF Hub

Not Suitable For:

❌ Quick Demos - Use HF Spaces for that
❌ Testing - Use local GPU for that

🔄 Workflow

Setup: Create HF repo and prepare data
Train: Launch ZeroGPU training
Monitor: Watch progress on Wandb
Save: Models automatically uploaded
Share: Use trained models

📈 Performance

Expected training performance on ZeroGPU H200:

Iterations/second: ~2-3
Memory usage: ~40-50GB
Training time: 2-4 hours for 10k iterations
Final model: Production quality

🎉 Success!

ZeroGPU is the proper way to use Hugging Face's free compute for real training. No more 4-minute limits - train your nano-coder model properly!

Next Steps:

Create HF repository
Upload files
Launch training
Monitor progress
Use your trained model!

Happy ZeroGPU training! 🚀