codearena-rl / FINETUNE_GUIDE.md
havinashpatil
Finalizing CodeArena RL Benchmark: frontend improvements, GRPO training scripts, and cleaned environment
03a7eb9

Fine-tuning Guide: XCoder-80K Dataset

This guide explains how to fine-tune Ollama models on the XCoder-80K code dataset.

Overview

The finetune_models.py script fine-tunes open-source code models on the XCoder-80K dataset from Hugging Face:

Ollama Model HuggingFace Model Size Recommended
llama3.2:latest meta-llama/Llama-2-7b-hf 7B βœ“ Best for code
gemma3:4b google/gemma-7b 7B βœ“ Good alternative
gemma3:1b google/gemma-2b 2B Lightweight option
llava:latest Not suitable Multimodal βœ— Skip (vision-only)

Dataset: banksy235/XCoder-80K

  • 80,000 code examples
  • Covers multiple programming languages
  • Suitable for code generation and repair

Installation

Quick Install (Recommended)

Windows:

install_finetune.bat

Linux/macOS:

bash install_finetune.sh

Manual Installation

  1. Install PyTorch with CUDA 12.1 support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  1. Install fine-tuning dependencies:
pip install -r requirements-finetune.txt
  1. Verify installation:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'GPU: {torch.cuda.is_available()}')"

Install Hugging Face CLI (Optional)

For easier dataset management:

# macOS/Linux
curl -LsSf https://hf.co/cli/install.sh | bash -s

# Or via pip
pip install huggingface_hub

# Login (for private datasets)
huggingface-cli login

Usage

Option 1: Fine-tune Single Model

Fine-tune Llama-2-7b on XCoder-80K (recommended for fastest start):

python finetune_models.py --model llama3.2 \
  --num-epochs 3 \
  --batch-size 4 \
  --learning-rate 2e-4

Option 2: Fine-tune All Models Sequentially

python finetune_models.py --all-models \
  --num-epochs 3 \
  --batch-size 4 \
  --max-samples 5000

Option 3: Custom Configuration

python finetune_models.py \
  --model llama3.2 \
  --output-dir ./my_finetuned_models \
  --num-epochs 5 \
  --batch-size 8 \
  --learning-rate 1e-4 \
  --max-samples 10000 \
  --no-lora  # Disable LoRA (full fine-tuning)

Training Arguments Explained

Argument Default Description
--model llama3.2 Model to fine-tune
--all-models False Fine-tune all available models
--output-dir ./finetuned_models Where to save fine-tuned models
--num-epochs 3 Training epochs (more = longer training)
--batch-size 4 Batch size (larger = more VRAM needed)
--learning-rate 2e-4 Learning rate (lower = slower updates)
--max-samples None Limit samples (None = use all 80K)
--no-lora False Disable LoRA (full fine-tuning)
--no-gradient-checkpointing False Disable gradient checkpointing

Output

After training, models are saved to:

finetuned_models/
β”œβ”€β”€ llama3_2/
β”‚   β”œβ”€β”€ final/
β”‚   β”‚   β”œβ”€β”€ pytorch_model.bin
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   └── tokenizer.json
β”‚   └── metadata.json
β”œβ”€β”€ gemma3_4b/
β”‚   └── ...
└── gemma3_1b/
    └── ...

Using Fine-tuned Models with Ollama

After fine-tuning, you can create custom Ollama models. Create a Modelfile:

FROM llama3.2:latest

# Replace the base model with fine-tuned weights
COPY ./finetuned_models/llama3_2/final /model

# Optional: Set parameters
PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1

Then create and run:

ollama create my-finetuned-llama -f Modelfile
ollama run my-finetuned-llama "your prompt here"

Or use directly in Python:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "./finetuned_models/llama3_2/final"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Use the model
inputs = tokenizer("def fibonacci", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

Hardware Requirements

Configuration VRAM Training Speed Recommended
RTX 4090 (24GB) 24GB ~2 hours βœ“ Excellent
RTX 4080 (16GB) 16GB ~3-4 hours βœ“ Good
RTX 4070 (12GB) 12GB ~5-6 hours Acceptable
Tesla T4 (16GB) 16GB ~4-5 hours Cloud-friendly
CPU only N/A ~1-2 days Not recommended

Optimization Tips:

  • Use --batch-size 2 for GPUs with <12GB VRAM
  • Enable --max-samples 1000 to train on subset first
  • LoRA (default) uses 70% less VRAM than full fine-tuning
  • Gradient checkpointing (default) reduces VRAM by 30%

Integration with CodeArena RL

To use fine-tuned models with the CodeArena RL environment:

  1. Export to Ollama (see above)
  2. Update Dashboard.jsx to use the new model:
    const [ollamaModel, setOllamaModel] = useState('my-finetuned-llama');
    
  3. Or update ollama_rl_rollout.py:
    python ollama_rl_rollout.py --ollama-model my-finetuned-llama
    

Monitoring Training

Training logs are saved to TensorBoard format:

tensorboard --logdir ./finetuned_models/llama3_2

Open http://localhost:6006 to monitor:

  • Training loss
  • Learning rate schedules
  • GPU usage

Troubleshooting

Out of Memory (OOM)

# Reduce batch size
python finetune_models.py --batch-size 2

# Or limit samples
python finetune_models.py --max-samples 1000

Slow Training

  • Ensure GPU is being used: nvidia-smi
  • Use smaller model: --model gemma3:1b
  • Reduce max_length in tokenization (in code)

Dataset Not Found

# Download manually first
python -c "from datasets import load_dataset; load_dataset('banksy235/XCoder-80K')"

# Or use Hugging Face CLI
hf download banksy235/XCoder-80K

Dataset Structure

The XCoder-80K dataset contains code examples with metadata. The script automatically handles:

  • Multi-language code (Python, JavaScript, Java, C++, etc.)
  • Code with comments and docstrings
  • Various programming tasks (algorithms, utilities, etc.)

Next Steps

  1. Run fine-tuning: python finetune_models.py --model llama3.2
  2. Monitor training: tensorboard --logdir ./finetuned_models/llama3_2
  3. Export to Ollama: Create custom Modelfile and ollama create
  4. Test in CodeArena: Update dashboard to use fine-tuned model
  5. Measure improvements: Run python plot_rewards.py to see RL performance gains

References