KoHRM-Text-1.4B FullSFT LFM25 Terminal ToolBench Epoch3

This repository is the third full-SFT epoch of KoHRM-Text-1.4B on the LFM25/ToolBench terminal dataset. It is a fine-tuned version of LLM-OS-Models/KoHRM-Text-1.4B and continues from the Epoch2 checkpoint.

Training

  • Base model: LLM-OS-Models/KoHRM-Text-1.4B
  • Parent checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2
  • Dataset: kohrm_sft_lfm25_terminal_toolbench_full_v1
  • Training type: full-parameter SFT, not LoRA
  • Total SFT epochs on this dataset: 3
  • Epoch3 hardware: 8 x NVIDIA H200
  • Global batch size: 180224 tokens
  • Learning rate: 2e-5
  • Epoch3 training wall time: about 3h 16m 30s
  • Final train loss: 0.5271

TB2-lite Result

Evaluation file:

tb2_lite/results/20260606T_kohrm_lfm25_epoch3_eval_sdpa8_b16/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch3-sdpa8-b16-nocompile-merged.json

Checkpoint Steps Score Cmd F1 Precision Recall First Cmd Valid JSON Avg Pred Cmds Sec/Step
Epoch1 303/303 38.56 0.3856 0.4262 0.4341 37.0% 55.1% 27.33 8.314
Epoch2 303/303 45.90 0.4590 0.5031 0.5098 44.9% 68.3% 25.16 10.842
Epoch3 303/303 43.57 0.4357 0.4703 0.5003 45.5% 61.7% 25.82 11.156

Score = 100 * avg_command_f1.

Interpretation

Epoch3 is not the current best KoHRM terminal checkpoint. It scored 43.57, which is -2.33 versus Epoch2 45.90. The representative KoHRM terminal checkpoint remains Epoch2.

What improved:

  • First Cmd increased slightly from 44.9% to 45.5%.
  • model_training reached F1 0.4910.

What regressed:

  • Cmd F1 fell from 0.4590 to 0.4357.
  • Precision fell from 0.5031 to 0.4703.
  • Valid JSON fell from 68.3% to 61.7%.

Strong source groups:

  • data_querying: 0.6150 (15 steps, First Cmd 53.3%)
  • data_science: 0.5940 (22 steps, First Cmd 63.6%)
  • model_training: 0.4910 (17 steps, First Cmd 17.6%)
  • debugging: 0.4727 (38 steps, First Cmd 52.6%)
  • software_engineering: 0.4591 (36 steps, First Cmd 55.6%)
  • scientific_computing: 0.4491 (20 steps, First Cmd 55.0%)

Weak source groups:

  • math: 0.3150 (16 steps, First Cmd 25.0%)
  • dependency_management: 0.3580 (15 steps, First Cmd 33.3%)
  • security: 0.3580 (23 steps, First Cmd 34.8%)
  • data_processing: 0.3621 (23 steps, First Cmd 43.5%)
  • swe: 0.3815 (23 steps, First Cmd 39.1%)
  • file_operations: 0.3875 (20 steps, First Cmd 45.0%)

Usage Note

KoHRM-Text uses the HRM-Text PrefixLM runtime, not a standard vLLM chat-model path. For this evaluation the local HF export path was evaluated with KOHRM_FORCE_SDPA_KVCACHE=1 and KOHRM_DISABLE_INFERENCE_COMPILE=1 because the local flash-attention build does not support append-KV cache for this run.

Downloads last month
32
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch3

Finetuned
(4)
this model