PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-3B was further trained using KTO (with apo_zero_unpaired loss type) using a mix of instruct, RP, and storygen datasets. I created rejected samples by using the SFT with bad settings (including logit bias) for every model turn.

The model was only trained at max_length=6144, and is nowhere near a full epoch as it eventually crashed. So think of this like a test of a test.

W&B Training Logs

train/rewards/chosen/rejected train/rewards/margins train/logits/chosen/rejected train/logps/chosen/rejected train/loss train/grad_norm

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 21.69
IFEval (0-Shot) 65.04
BBH (3-Shot) 22.29
MATH Lvl 5 (4-Shot) 11.78
GPQA (0-shot) 2.91
MuSR (0-shot) 4.69
MMLU-PRO (5-shot) 23.42
Downloads last month
22
Safetensors
Model size
3.21B params
Tensor type
F32
·
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-HailMary-v0.1-KTO-3B

Evaluation results