PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-3B was further trained using KTO (with apo_zero_unpaired loss type) using a mix of instruct, RP, and storygen datasets. I created rejected samples by using the SFT with bad settings (including logit bias) for every model turn.

The model was only trained at max_length=6144, and is nowhere near a full epoch as it eventually crashed. So think of this like a test of a test.

W&B Training Logs

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	21.69
IFEval (0-Shot)	65.04
BBH (3-Shot)	22.29
MATH Lvl 5 (4-Shot)	11.78
GPQA (0-shot)	2.91
MuSR (0-shot)	4.69
MMLU-PRO (5-shot)	23.42

Model tree for PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-HailMary-v0.1-KTO-3B

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

65.040
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

22.290
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

11.780
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

2.910
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

4.690
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

23.420

View on Papers With Code