Instructions to use minnesotanlp/Finch-8B-KTO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use minnesotanlp/Finch-8B-KTO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="minnesotanlp/Finch-8B-KTO") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("minnesotanlp/Finch-8B-KTO") model = AutoModelForCausalLM.from_pretrained("minnesotanlp/Finch-8B-KTO") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use minnesotanlp/Finch-8B-KTO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "minnesotanlp/Finch-8B-KTO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minnesotanlp/Finch-8B-KTO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/minnesotanlp/Finch-8B-KTO
- SGLang
How to use minnesotanlp/Finch-8B-KTO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "minnesotanlp/Finch-8B-KTO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minnesotanlp/Finch-8B-KTO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "minnesotanlp/Finch-8B-KTO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minnesotanlp/Finch-8B-KTO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use minnesotanlp/Finch-8B-KTO with Docker Model Runner:
docker model run hf.co/minnesotanlp/Finch-8B-KTO
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks
A mid-training "practice phase" that teaches small open-source LLMs how to evolve solutions.
Finch-8B-KTO extends Finch-8B with a second, preference-learning stage (KTO). On top of evolution fine-tuning — which teaches the model how to evolve a solution as a mutation operator — KTO adds the ability to self-judge which candidate solutions are promising and which fall short. It is the family's strongest offline-RL variant, surpassing the best human score on multiple mathematical-discovery tasks. Lineage: Qwen3-8B → Finch-8B (EFT + SFT) → Finch-8B-KTO.
TL;DR
Evolution Fine-Tuning (EFT) turns evolutionary search trajectories into supervision, moving discovery behavior from the scaffold into the model. KTO then trains the model on improved vs. regressed transitions jointly, so it internalizes a sense of solution quality — pushing Finch-8B past the best human score on both autocorrelation-inequality tasks.
- (Left) EFT acts as mid-training, boosting Finch's discovery on the Erdős minimum-overlap problem under both test-time search and test-time learning.
- (Right) On NP-hard competitive programming, Finch composes strategies learned across diverse domains, while the base model relies on a single repetitive strategy.
Finch family
How to Use Finch
- Execute OpenEvolve scaffold with Finch
Finch is a mutation operator for evolutionary search, most effective driven by a scaffold such as OpenEvolve (T = 100, temperature 0.7, top-p 0.95, up to 30K tokens).
You can also use other scaffolds in the SkyDiscover framework, but we do not guarantee performance, as our model is trained on OpenEvolve's trajectories — one of this work's limitations.
- Calling Finch directly
You can also call Finch directly:
System prompt (task-level instruction from the OpenEvolve scaffold):
You are an expert mathematician specializing in circle packing problems and computational geometry.
Your task is to improve a constructor function that directly produces a specific arrangement of
26 circles in a unit square, maximizing the sum of their radii.
The AlphaEvolve paper achieved a sum of 2.635 for n=26.
Key geometric insights:
- Circle packings often follow hexagonal patterns in the densest regions
- Maximum density for infinite circle packing is pi/(2*sqrt(3)) ≈ 0.9069
- Edge effects make square container packing harder than infinite packing
- Similar radius circles often form regular patterns, while varied radii allow better space utilization
User prompt (evolutionary state — current program + evaluator feedback + evolutionary history):
# Current Program Information
- Fitness: 0.3642 (sum_radii: 0.9598)
- Focus areas: Fitness unchanged at 0.3642. Consider simplifying — code length exceeds 500 characters.
# Program Evolution History
## Previous Attempts
### Attempt 1
- Changes: Replace concentric ring placement with hexagonal lattice (5-6-5-6-5 row pattern)
- Metrics: sum_radii: 0.9598, validity: 1.0 — Improvement in all metrics
# Current Program
# EVOLVE-BLOCK-START
import numpy as np
def construct_packing():
n = 26
centers = np.zeros((n, 2))
centers[0] = [0.5, 0.5] # center circle
for i in range(8): # inner ring
angle = 2 * np.pi * i / 8
centers[i+1] = [0.5 + 0.3*np.cos(angle), 0.5 + 0.3*np.sin(angle)]
for i in range(16): # outer ring
angle = 2 * np.pi * i / 16
centers[i+9] = [0.5 + 0.7*np.cos(angle), 0.5 + 0.7*np.sin(angle)]
centers = np.clip(centers, 0.01, 0.99)
radii = compute_max_radii(centers)
return centers, radii, np.sum(radii)
# EVOLVE-BLOCK-END
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "minnesotanlp/Finch-8B-KTO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
# Given an evolutionary state — task instruction + parent program + evolutionary history
# + evaluator feedback — Finch proposes an improved candidate program.
messages = [
{"role": "system", "content": SYSTEM_PROMPT}, # provided by your evolutionary scaffold
{"role": "user", "content": USER_PROMPT}, # parent program + feedback + history
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=30000, do_sample=True, temperature=0.7, top_p=0.95)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Training
Two stages:
- Stage 1 — EFT (SFT).
Finch-8B: full SFT of Qwen3-8B onimprovedtransitions from the Finch Collection (355 training tasks; one run/task → 30,445 examples; 900 for validation) with LLaMA-Factory — 1 epoch, global batch size 128, LR 1e-5, on 8× NVIDIA H200 140GB. - Stage 2 — KTO. Preference learning (KTO) on
improved(desirable) andregressed(undesirable) transitions jointly, maximizing the contrastive signal that guides the model toward self-judging which solutions are promising and which fall short. - Teacher (data). Trajectories generated by Qwen3.5-397B-A17B inside the OpenEvolve scaffold.
Results
- Finch outperforms its same-size base model by +10.2% on 22 held-out tasks across 5 domains, with improvements of up to +290% on individual tasks.
- Larger models benefit more, and Finch-4B matches a model roughly 2× larger on the Erdős task.
- On competitive programming (FrontierCS), Finch-9B averages 46.01 vs base Qwen3.5-9B's 32.46; on CALICO's P263 (UC Berkeley's official open-ended contest) it scores 86.10 vs 55.09
- With preference learning (KTO), Finch-8B surpasses the best human score on AC1 and AC2, while its competitive programming score improves from 24.56 → 37.30.
- Finch-8B matches SOTA on two circle-packing tasks and improves the Erdős task by +3.2%.
Limitations
Trajectories are collected and evaluated only with OpenEvolve; behavior under different scaffolds is not guaranteed.
License
The Finch Collection is released under the CC-BY 4.0 License and is recommended for non-commercial academic research. The accompanying code and Finch model weights are released under the Apache 2.0 License.
Acknowledgements
This research was supported by the "Advanced GPU Utilization Support Program" funded by the Government of the Republic of Korea (Ministry of Science and ICT). We are grateful to the SkyDiscover team for their valuable feedback on the dataset construction process, the use of the SkyDiscover framework, and the overall direction of this research — in particular, Shu Liu, Shubham Agarwal, and Mert Cemri for their insightful comments and discussions. We also thank the OpenEvolve team, especially Ritik Vijayvergiya and Asankhaya Sharma, for their guidance on using the OpenEvolve framework and for their thoughtful comments on this work. We further thank the authors of ALE-Bench, especially Yuki Imajuku, and the AtCoder team for authorizing the public release of the evolutionary search trajectories derived from their CC BY-ND 4.0-licensed dataset. Finally, we thank Byung-Kwan Lee for valuable feedback during the early stages of this project.
Citation
@misc{lee2026evolutionfinetuninglearningdiscover,
title={Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks},
author={Young-Jun Lee and Seungone Kim and Minki Kang and Alistair Cheong Liang Chuen and Zerui Chen and Seungho Han and Taehee Jung and Dongyeop Kang},
year={2026},
eprint={2606.29082},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2606.29082},
}
- Downloads last month
- -