Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

A mid-training "practice phase" that teaches small open-source LLMs how to evolve solutions.

Website arXiv GitHub Dataset Finch-2B Finch-4B Finch-9B Finch-4B-KTO Finch-8B-KTO Apache 2.0

Finch-8B is the 8B member of the Finch family — open-source LLMs evolution fine-tuned (EFT) to act as a stronger mutation operator inside evolutionary search. Uniquely in the family it is built on Qwen3-8B (the others use the Qwen3.5 series) and trained on the Finch Collection. It learns how to evolve a solution and is the strongest variant to pair with test-time RL.

TL;DR

State-of-the-art discovery systems put an LLM inside an evolutionary search scaffold — but the discovery know-how lives in the scaffold, and every new task starts from zero. Evolution Fine-Tuning (EFT) moves that behavior into the model by turning evolutionary search trajectories into supervision. Finch-8B is a strong drop-in mutation operator that also synergizes with test-time reinforcement learning.

EFT as mid-training
  • (Left) EFT acts as mid-training, boosting Finch's discovery on the Erdős minimum-overlap problem under both test-time search and test-time learning.
  • (Right) On NP-hard competitive programming, Finch composes strategies learned across diverse domains, while the base model relies on a single repetitive strategy.

Finch family

Model Base Params Training 🤗 Hugging Face
Finch-2B Qwen3.5-2B 2B EFT Open on Hugging Face
Finch-4B Qwen3.5-4B 4B EFT Open on Hugging Face
Finch-8Bthis model Qwen3-8B 8B EFT Open on Hugging Face
Finch-9B Qwen3.5-9B 9B EFT Open on Hugging Face
Finch-4B-KTO Qwen3.5-4B 4B EFT + KTO Open on Hugging Face
Finch-8B-KTO Qwen3-8B 8B EFT + KTO Open on Hugging Face

How to Use Finch

  1. Execute OpenEvolve scaffold with Finch

Finch is a mutation operator for evolutionary search, most effective driven by a scaffold such as OpenEvolve (T = 100, temperature 0.7, top-p 0.95, up to 30K tokens). You can also use other scaffolds in the SkyDiscover framework, but we do not guarantee performance, as our model is trained on OpenEvolve's trajectories — one of this work's limitations.

  1. Calling Finch directly

You can also call Finch directly:

System prompt (task-level instruction from the OpenEvolve scaffold):

You are an expert mathematician specializing in circle packing problems and computational geometry.
Your task is to improve a constructor function that directly produces a specific arrangement of
26 circles in a unit square, maximizing the sum of their radii.
The AlphaEvolve paper achieved a sum of 2.635 for n=26.

Key geometric insights:
- Circle packings often follow hexagonal patterns in the densest regions
- Maximum density for infinite circle packing is pi/(2*sqrt(3)) ≈ 0.9069
- Edge effects make square container packing harder than infinite packing
- Similar radius circles often form regular patterns, while varied radii allow better space utilization

User prompt (evolutionary state — current program + evaluator feedback + evolutionary history):

# Current Program Information
- Fitness: 0.3642 (sum_radii: 0.9598)
- Focus areas: Fitness unchanged at 0.3642. Consider simplifying — code length exceeds 500 characters.

# Program Evolution History
## Previous Attempts

### Attempt 1
- Changes: Replace concentric ring placement with hexagonal lattice (5-6-5-6-5 row pattern)
- Metrics: sum_radii: 0.9598, validity: 1.0 — Improvement in all metrics

# Current Program

# EVOLVE-BLOCK-START
import numpy as np

def construct_packing():
    n = 26
    centers = np.zeros((n, 2))
    centers[0] = [0.5, 0.5]                              # center circle
    for i in range(8):                                    # inner ring
        angle = 2 * np.pi * i / 8
        centers[i+1] = [0.5 + 0.3*np.cos(angle), 0.5 + 0.3*np.sin(angle)]
    for i in range(16):                                   # outer ring
        angle = 2 * np.pi * i / 16
        centers[i+9] = [0.5 + 0.7*np.cos(angle), 0.5 + 0.7*np.sin(angle)]
    centers = np.clip(centers, 0.01, 0.99)
    radii = compute_max_radii(centers)
    return centers, radii, np.sum(radii)
# EVOLVE-BLOCK-END
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "minnesotanlp/Finch-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

# Given an evolutionary state — task instruction + parent program + evolutionary history
# + evaluator feedback — Finch proposes an improved candidate program.
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},  # provided by your evolutionary scaffold
    {"role": "user", "content": USER_PROMPT},      # parent program + feedback + history
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

out = model.generate(inputs, max_new_tokens=30000, do_sample=True, temperature=0.7, top_p=0.95)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training

  • Data. improved transitions from the Finch Collection across 355 training tasks (16 of 371 held out). One evolutionary run is kept per task → 30,445 supervised examples; 900 uniformly-sampled examples for validation.
  • Teacher. Trajectories generated by Qwen3.5-397B-A17B inside the OpenEvolve scaffold.
  • Recipe. Full SFT with LLaMA-Factory1 epoch, global batch size 128, learning rate 1e-5, on 8× NVIDIA H200 140GB GPUs.

Want a sharper self-judging variant? See Finch-8B-KTO, which adds a KTO preference-learning stage on top of this model.

Results

  • Finch outperforms its same-size base by up to +10.24% across 22 held-out tasks spanning 5 domains, with per-task gains reaching +290%. Gains scale with model size — Finch-4B already matches a model roughly twice its size on the Erdős task.
main results
  • On NP-hard competitive programming (FrontierCS), Finch-9B averages 46.01 vs base Qwen3.5-9B's 32.46; on CALICO's P263 — UC Berkeley's official open-ended contest — it scores 86.10 vs 55.09.
frontiercs results
  • With preference learning (KTO), Finch-8B surpasses the best human score on both AC1 and AC2, lifting its competitive-programming average from 24.56 → 37.30. Paired with the nanodiscover learning scaffold, it also matches SOTA on two circle-packing tasks and improves Erdős by +3.2%.
frontiercs results

Limitations

  • Trajectories are collected and evaluated only with OpenEvolve; behavior under different scaffolds is not guaranteed.
  • The synergy with test-time RL is demonstrated primarily on mathematical tasks.
  • Finch-8B inherits the capabilities and biases of Qwen3-8B.

License

The Finch Collection is released under the CC-BY 4.0 License and is recommended for non-commercial academic research. The accompanying code and Finch model weights are released under the Apache 2.0 License.

Acknowledgements

This research was supported by the "Advanced GPU Utilization Support Program" funded by the Government of the Republic of Korea (Ministry of Science and ICT). We are grateful to the SkyDiscover team for their valuable feedback on the dataset construction process, the use of the SkyDiscover framework, and the overall direction of this research — in particular, Shu Liu, Shubham Agarwal, and Mert Cemri for their insightful comments and discussions. We also thank the OpenEvolve team, especially Ritik Vijayvergiya and Asankhaya Sharma, for their guidance on using the OpenEvolve framework and for their thoughtful comments on this work. We further thank the authors of ALE-Bench, especially Yuki Imajuku, and the AtCoder team for authorizing the public release of the evolutionary search trajectories derived from their CC BY-ND 4.0-licensed dataset. Finally, we thank Byung-Kwan Lee for valuable feedback during the early stages of this project.

Citation

@misc{lee2026evolutionfinetuninglearningdiscover,
      title={Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks}, 
      author={Young-Jun Lee and Seungone Kim and Minki Kang and Alistair Cheong Liang Chuen and Zerui Chen and Seungho Han and Taehee Jung and Dongyeop Kang},
      year={2026},
      eprint={2606.29082},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.29082}, 
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minnesotanlp/Finch-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1786)
this model
Finetunes
1 model
Quantizations
2 models

Dataset used to train minnesotanlp/Finch-8B

Collection including minnesotanlp/Finch-8B

Paper for minnesotanlp/Finch-8B