KoHRM-Text-1.4B FullSFT LFM25 Terminal ToolBench Epoch3

This repository is the third full-SFT epoch of KoHRM-Text-1.4B on the LFM25/ToolBench terminal dataset. It is a fine-tuned version of LLM-OS-Models/KoHRM-Text-1.4B and continues from the Epoch2 checkpoint.

Training

Base model: LLM-OS-Models/KoHRM-Text-1.4B
Parent checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2
Dataset: kohrm_sft_lfm25_terminal_toolbench_full_v1
Training type: full-parameter SFT, not LoRA
Total SFT epochs on this dataset: 3
Epoch3 hardware: 8 x NVIDIA H200
Global batch size: 180224 tokens
Learning rate: 2e-5
Epoch3 training wall time: about 3h 16m 30s
Final train loss: 0.5271

TB2-lite Result

Evaluation file:

tb2_lite/results/20260606T_kohrm_lfm25_epoch3_eval_sdpa8_b16/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch3-sdpa8-b16-nocompile-merged.json

Checkpoint	Steps	Score	Cmd F1	Precision	Recall	First Cmd	Valid JSON	Avg Pred Cmds	Sec/Step
Epoch1	303/303	38.56	0.3856	0.4262	0.4341	37.0%	55.1%	27.33	8.314
Epoch2	303/303	45.90	0.4590	0.5031	0.5098	44.9%	68.3%	25.16	10.842
Epoch3	303/303	43.57	0.4357	0.4703	0.5003	45.5%	61.7%	25.82	11.156

Score = 100 * avg_command_f1.

Interpretation

Epoch3 is not the current best KoHRM terminal checkpoint. It scored 43.57, which is -2.33 versus Epoch2 45.90. The representative KoHRM terminal checkpoint remains Epoch2.

What improved:

First Cmd increased slightly from 44.9% to 45.5%.
model_training reached F1 0.4910.

What regressed:

Cmd F1 fell from 0.4590 to 0.4357.
Precision fell from 0.5031 to 0.4703.
Valid JSON fell from 68.3% to 61.7%.

Strong source groups:

data_querying: 0.6150 (15 steps, First Cmd 53.3%)
data_science: 0.5940 (22 steps, First Cmd 63.6%)
model_training: 0.4910 (17 steps, First Cmd 17.6%)
debugging: 0.4727 (38 steps, First Cmd 52.6%)
software_engineering: 0.4591 (36 steps, First Cmd 55.6%)
scientific_computing: 0.4491 (20 steps, First Cmd 55.0%)

Weak source groups:

math: 0.3150 (16 steps, First Cmd 25.0%)
dependency_management: 0.3580 (15 steps, First Cmd 33.3%)
security: 0.3580 (23 steps, First Cmd 34.8%)
data_processing: 0.3621 (23 steps, First Cmd 43.5%)
swe: 0.3815 (23 steps, First Cmd 39.1%)
file_operations: 0.3875 (20 steps, First Cmd 45.0%)

Usage Note

KoHRM-Text uses the HRM-Text PrefixLM runtime, not a standard vLLM chat-model path. For this evaluation the local HF export path was evaluated with KOHRM_FORCE_SDPA_KVCACHE=1 and KOHRM_DISABLE_INFERENCE_COMPILE=1 because the local flash-attention build does not support append-KV cache for this run.

Downloads last month: 32

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch3

Base model

LLM-OS-Models/KoHRM-Text-1.4B

Finetuned

(4)

this model