clem/gemini · Detailed AlphaCode 2 Summary (Built on Gemini)

🖥️ AlphaCode 2 highlights:
Ensemble of models: family of generation models + 1 scoring model (all fine-tuned from Gemini Pro)

2 stage fine-tuning using GOLD (offline, off-policy RL, either using sum of LM log probs+60 over sequence (Gold-p) or sum of probs (Gold-s) as reward)
1st stage: vary hyperparameters to create family of Gemini Pro models on CodeContestsv2
2nd stage: fine-tune on different, higher-quality dataset

Generate 1M samples, randomizing temperature parameter for each sample for diversity
Only use C++ (higher quality than Python generations)
Filter out solutions that fail provided test cases/(do not compile (<5%)) (removes ~95% of samples)
Cluster remaining ~50k samples, keep largest 10
Choose best in cluster to submit: use scoring model (predicts correctness, between 0 and 1)

AlphaCode: est. ~50th percentile
AlphaCode 2: est. 85th percentile
AlphaCode 2 + human: est. 90th percentile