Instructions to use LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL") model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL
- SGLang
How to use LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL
LFM2.5-8B-A1B-KO-CPT-FULL
Full-parameter Korean continued-pretraining project for LiquidAI/LFM2.5-8B-A1B.
This model is intended to make LFM2.5 stronger at Korean legal, finance, wiki-style knowledge, and terminal/tool-use behavior while preserving the base model's general English and instruction-following ability.
- GitHub: https://github.com/gyunggyung/LFM25-KO-CPT
- SFT follow-up GitHub: https://github.com/gyunggyung/LFM25-KO-SFT
- SFT follow-up model: https://huggingface.co/LLM-OS-Models/LFM2.5-8B-A1B-KO-SFT
- Agentic follow-up model: https://huggingface.co/LLM-OS-Models/LFM2.5-8B-A1B-KO-Agentic-SFT
- Public CPT data:
- LFM-style full raw: https://huggingface.co/datasets/LLM-OS-Models/LFM2.5-KO-CPT-Full-LFMStyle-Raw-20260627
- LFM-style source shards: https://huggingface.co/datasets/LLM-OS-Models/LFM2.5-KO-CPT-Full-LFMStyle-Shards-20260627
- Raw mix before LFM wrapping: https://huggingface.co/datasets/LLM-OS-Models/LFM2.5-KO-CPT-Full-Raw-Mix-20260627
- Public SFT and Agentic data are indexed on the SFT model card: https://huggingface.co/LLM-OS-Models/LFM2.5-8B-A1B-KO-SFT.
Public CPT dataset releases:
| release | size | format | source / purpose |
|---|---|---|---|
| CPT LFM-style full raw | 20.54GB | single LFM-style JSONL | full Korean CPT source after LFM-style wrapping |
| CPT LFM-style source shards | 26.20GB | source-separated JSONL shards | auditable Korean Wiki, finance, legal, legal RAG/bar-answer, terminal/tool shards |
| CPT raw mix before LFM wrapping | 4.10GB | raw JSONL | pre-conversion CPT mix for debugging/rebuilding |
Status: full CPT completed on 2026-06-28. Weights are prepared from the verified
checkpoint-10196final-step checkpoint and uploaded to Hugging Face. vLLM evaluation shows strong gains on instruction-following, GSM8K, BoolQ, ARC, and several Korean knowledge subjects, but also regressions on Korean hard MCQA, MMLU-ProX-lite-ko, and some STEM/legal/accounting slices.
Performance Snapshot
All numbers below are vLLM/lm-eval base-vs-CPT comparisons against LiquidAI/LFM2.5-8B-A1B. Higher is better.
Confirmed Gains
| Benchmark | Metric | Base | CPT | Delta | Relative |
|---|---|---|---|---|---|
leaderboard_instruction_following / leaderboard_ifeval |
prompt loose | 0.2902 | 0.3457 | +0.0555 | +19.11% |
| IFEval full | prompt loose | 0.2921 | 0.3216 | +0.0295 | +10.10% |
| GSM8K full 5-shot | exact_match flexible | 0.4845 | 0.5701 | +0.0856 | +17.67% |
| GSM8K full 5-shot | exact_match strict | 0.2472 | 0.4617 | +0.2145 | +86.77% |
| BoolQ full | acc | 0.6544 | 0.7902 | +0.1358 | +20.75% |
| ARC-Challenge full | acc_norm | 0.3771 | 0.4241 | +0.0469 | +12.44% |
| PIQA full | acc_norm | 0.7203 | 0.7476 | +0.0272 | +3.78% |
Global MMLU KO medical_genetics |
acc | 0.2900 | 0.3800 | +0.0900 | +31.03% |
Global MMLU KO nutrition |
acc | 0.2549 | 0.3203 | +0.0654 | +25.64% |
Global MMLU KO philosophy |
acc | 0.2669 | 0.3215 | +0.0547 | +20.48% |
Global MMLU KO miscellaneous |
acc | 0.3372 | 0.3921 | +0.0549 | +16.29% |
| MMLU-Pro economics | exact_match | 0.4277 | 0.4704 | +0.0427 | +9.97% |
Regressions To Fix
| Benchmark | Metric | Base | CPT | Delta | Relative |
|---|---|---|---|---|---|
| MMLU-ProX Lite KO | exact_match | 0.2585 | 0.1667 | -0.0918 | -35.53% |
| KMMLU hard | acc | 0.2015 | 0.1720 | -0.0295 | -14.63% |
| KMMLU hard STEM | acc | 0.1973 | 0.1564 | -0.0409 | -20.74% |
Global MMLU KO professional_medicine |
acc | 0.3235 | 0.2316 | -0.0919 | -28.41% |
Global MMLU KO high_school_statistics |
acc | 0.2870 | 0.1574 | -0.1296 | -45.16% |
Global MMLU KO astronomy |
acc | 0.3421 | 0.2829 | -0.0592 | -17.31% |
Global MMLU KO high_school_computer_science |
acc | 0.3100 | 0.2800 | -0.0300 | -9.68% |
MMLU-Pro law LIMIT=500 |
exact_match | 0.1840 | 0.1240 | -0.0600 | -32.61% |
| Leaderboard Math hard | exact_match | 0.4977 | 0.4275 | -0.0702 | -14.11% |
Interpretation: the CPT run successfully injects Korean-domain knowledge and preserves or improves several general benchmarks, but it is not a finished Korean instruction model. The next post-training stage should target Korean MCQA reliability, option-label extraction, STEM hard questions, legal/accounting reasoning, and preservation of the current IFEval/GSM8K/BoolQ gains.
Likely failure mode: many regressions are not simple "Korean got worse" failures. They cluster around multiple-choice answering, exact answer extraction, option-label discipline, and hard STEM/legal/accounting formats. Open-ended Korean knowledge slices and instruction-following often improve, while Korean MCQA and parser-sensitive exact-match tasks need targeted remediation.
Contents
- English
- Performance Snapshot
- Quick Start
- Colab Example
- Training Configuration
- Data Mix
- Legal Data Attribution
- Korean
- 한국어 사용법
- 한국어 학습 설정
- Evaluation Plan
English
LFM2.5-8B-A1B-KO-CPT-FULL is a full fine-tuned Korean CPT checkpoint, not a LoRA adapter. The training objective is text completion over a Korean-heavy corpus, with LFM chat-template formatting applied to instruction, RAG, and tool-use examples.
Training code, source manifests, dataset cards, and runbooks are published at https://github.com/gyunggyung/LFM25-KO-CPT. The supervised fine-tuning follow-up is tracked at https://github.com/gyunggyung/LFM25-KO-SFT.
Target strengths:
- Korean legal document understanding and legal RAG-style answering
- Korean finance explanations and finance-domain terminology
- Korean wiki/general knowledge prose
- Korean instruction-following
- Terminal/tool-use style structured assistant behavior
Quick Start
The examples below use the full model repository.
Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are a precise Korean assistant."},
{"role": "user", "content": "대한민국 민법상 계약 해제와 해지의 차이를 간단히 설명해줘."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.6,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0], skip_special_tokens=False))
vLLM
vllm serve LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL \
--trust-remote-code \
--dtype bfloat16 \
--max-model-len 8192 \
--tensor-parallel-size 8
OpenAI-compatible request:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL",
messages=[
{"role": "system", "content": "You are a precise Korean assistant."},
{"role": "user", "content": "한국 기준금리 인상이 은행 순이자마진에 미치는 영향을 설명해줘."},
],
temperature=0.5,
max_tokens=512,
)
print(response.choices[0].message.content)
Colab Example
Use this after model weights are uploaded. For typical Colab GPUs, start with 4-bit loading to avoid OOM.
!pip install -U "transformers>=4.44" accelerate bitsandbytes sentencepiece huggingface_hub
import torch
from huggingface_hub import login
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# Optional for gated/private models:
# login("hf_xxx")
model_id = "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "너는 한국어로 정확하고 간결하게 답하는 어시스턴트다."},
{"role": "user", "content": "한국어로 주택임대차보호법의 대항력 요건을 설명해줘."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.5,
top_p=0.9,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
If you have an A100/H100/H200 runtime, bf16 loading can be used instead of 4-bit:
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
Prompt Format
The model follows the LFM2 chat-template style. Use the tokenizer chat template when possible. The CPT corpus preserves these special tokens for chat and tool-use records:
<|startoftext|><|im_start|><|im_end|>- roles:
system,user,assistant,tool
References:
- LFM text generation: https://docs.liquid.ai/lfm/key-concepts/text-generation-and-prompting
- LFM chat template: https://docs.liquid.ai/lfm/key-concepts/chat-template
- LFM tool use: https://docs.liquid.ai/lfm/key-concepts/tool-use
Training Configuration
- Base model:
LiquidAI/LFM2.5-8B-A1B - Method: full-parameter continued pretraining, not LoRA
- Framework: Unsloth + TRL
SFTTrainer - Hardware target: 8x NVIDIA H200
- Context length: 8192
- Precision: bf16 when supported
- Optimizer:
adamw_8bit - GPUs: 8
- Per-device batch size: 2
- Gradient accumulation steps: 4
- Effective batch: 64 sequences/update
- Maximum tokens/update: 524,288
- Learning rate: 2e-5
- Schedule: 1 epoch over the prepared full corpus
max_steps: -1- Checkpoint interval: 1,000 steps
- Checkpoint retention: 4 latest checkpoints plus final model
Completed run:
- Estimated tokens: 6,492,697,020
- Raw estimated steps before packing: 12,384
- Actual packed trainer steps: 10,196
- Train runtime: about 9h 38m
- Train samples/sec: 18.81
- Train steps/sec: 0.294
- Final logged train loss: 0.712
- Final checkpoint source:
checkpoint-10196 - Final model integrity check:
model.safetensorsopens successfully with 2,302 tensors
Note: the distributed torchrun process reached step 10196/10196 and wrote checkpoint-10196. A SIGSEGV occurred during the extra post-train trainer.save_model(final_full) write, leaving the initial final_full/model.safetensors incomplete. The published final_full was rebuilt from the verified checkpoint-10196 inference files.
Data Mix
Prepared full mix:
/home/work/.data/lfm2_ko_cpt/datasets/ko_cpt_mix_full_lfmstyle_20260627.jsonl
Statistics:
- Rows after global deduplication: 4,622,971
- Characters: 11,581,567,658
- Estimated tokens: 6,492,697,020
- Raw estimated training steps: 12,384 at effective batch 64 and sequence length 8192
- Actual packed trainer steps: 10,196
Per-source rows:
| Source | Rows |
|---|---|
kowiki_raw_full_20260524 |
611,403 |
bcai_finance_kor_hrm_20260524 |
1,861,531 |
korean_legal_raw_full_20260523 |
227,687 |
korean_legal_tasks_full_20260524 |
1,383,340 |
korean_admrule_precedent_raw_full_20260524 |
203,477 |
ko_legal_source_agent_sft_20260621 |
5,999 |
ko_legal_rag_agent_sft_round15_v2 |
749 |
current_law_bar_json_answer_sft_20260621 |
2,000 |
lfm25_terminal_toolbench_hrm_turns_v1 |
326,785 |
Raw Korean wiki/legal/finance documents are kept as plain completion text for CPT. Instruction, legal RAG, and terminal/tool-use examples are converted to LFM ChatML-style text.
Legal Data Attribution
Legal-domain data is attributed to the public Legalize-KR ecosystem and related Korean legal source corpora used in the local CPT mix.
Legalize-KR links:
- Organization: https://github.com/legalize-kr
- Korean statutes repository: https://github.com/legalize-kr/legalize-kr
- Korean court precedent repository: https://github.com/legalize-kr/precedent-kr
- Korean administrative rules repository: https://github.com/legalize-kr/admrule-kr
- Korean local ordinance repository: https://github.com/legalize-kr/ordinance-kr
- Data collection/conversion pipeline: https://github.com/legalize-kr/legalize-pipeline
- Legalize-KR website: https://legalize.kr
- Original public legal source: https://www.law.go.kr
The Legalize-KR organization describes its project as converting Korean statutes, precedents, administrative rules, and local ordinances into Markdown and Git history. Its README states that source data is obtained from the National Law Information Center OpenAPI and transformed into Git repositories. Long-term reproducibility should pin a snapshot or release where possible because Legalize-KR notes that Git history can be reconstructed when parsing and normalization rules improve.
Recommended attribution format:
- Statutes: cite
legalize-kr/legalize-kr, the Markdown path such askr/{statute-name}/{statute-type}.md, and stable metadata fields such as법령ID,법령MST, promulgation date, effective date, and the출처URL fromlaw.go.kr. - Precedents: cite
legalize-kr/precedent-kr, the Markdown path such as{case-type}/{court-level}/{court}_{decision-date}_{case-number}.md, and stable identifiers such as판례일련번호, court name, decision date, and case number. - Administrative rules: cite
legalize-kr/admrule-kr, the Markdown path such as{agency-path}/{rule-type}/{rule-name}/본문.md, plus rule serial number or issuing number when available. - Local ordinances: cite
legalize-kr/ordinance-kr, the Markdown path such as{province}/{city-or-office}/{ordinance-type}/{ordinance-name}/본문.md, plus자치법규ID,자치법규일련번호, promulgation date, promulgation number, and the출처URL. - Avoid using only Git commit hashes as long-term identifiers because Legalize-KR warns that repository history may be reconstructed after parser or normalization improvements.
- License note from the Legalize-KR READMEs: original legal text is Korean government public work; repository structure and metadata are MIT where specified by the repository.
Local legal sources included in this CPT run:
korean_legal_raw_full_20260523korean_legal_tasks_full_20260524korean_admrule_precedent_raw_full_20260524ko_legal_source_agent_sft_20260621ko_legal_rag_agent_sft_round15_v2current_law_bar_json_answer_sft_20260621
Korean
LFM2.5-8B-A1B-KO-CPT-FULL은 LoRA 어댑터가 아니라 full-parameter CPT 모델입니다. 목표는 LFM2.5-8B-A1B에 한국어 법률, 금융, 위키 지식과 터미널/도구 사용 스타일을 계속 사전학습으로 이식하는 것입니다.
목표 성능:
- 한국어 법률 문서 이해와 법률 RAG 답변
- 한국어 금융 설명과 금융 용어 처리
- 한국어 위키/일반 지식 문체
- 한국어 instruction following
- 터미널/도구 호출형 assistant 동작 보존
한국어 사용법
가중치 업로드 후 아래처럼 사용할 수 있습니다.
Transformers 사용
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "너는 한국어로 정확하고 간결하게 답하는 어시스턴트다."},
{"role": "user", "content": "상법상 이사의 충실의무를 실무 관점에서 설명해줘."},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.5,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0], skip_special_tokens=False))
vLLM 사용
vllm serve LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL \
--trust-remote-code \
--dtype bfloat16 \
--max-model-len 8192 \
--tensor-parallel-size 8
요청 예시:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL",
messages=[
{"role": "system", "content": "너는 한국어로 정확하고 간결하게 답하는 어시스턴트다."},
{"role": "user", "content": "부동산 임대차 계약에서 보증금 반환 분쟁의 핵심 쟁점을 정리해줘."},
],
temperature=0.5,
max_tokens=512,
)
print(response.choices[0].message.content)
Colab 사용 예시
일반 Colab GPU에서는 VRAM 부족을 피하려고 4-bit 로딩부터 쓰는 것이 좋습니다.
!pip install -U "transformers>=4.44" accelerate bitsandbytes sentencepiece huggingface_hub
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "LLM-OS-Models/LFM2.5-8B-A1B-KO-CPT-FULL"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "너는 한국어로 정확하고 간결하게 답하는 어시스턴트다."},
{"role": "user", "content": "한국 금융시장에서 기준금리와 채권 가격의 관계를 설명해줘."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.5,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
권장 생성 설정
- 법률/금융 설명:
temperature=0.3-0.6,top_p=0.8-0.95 - 일반 한국어 답변:
temperature=0.5-0.8,top_p=0.9 - 긴 문서 요약:
max_new_tokens=1024이상 - 도구 사용/구조화 출력: 낮은 temperature 권장
한국어 학습 설정
- 베이스 모델:
LiquidAI/LFM2.5-8B-A1B - 방식: full-parameter CPT, LoRA 아님
- 하드웨어: NVIDIA H200 8장
- 컨텍스트 길이: 8192
- GPU당 batch size: 2
- gradient accumulation: 4
- effective batch: 64 sequences/update
- update당 최대 token: 524,288
- learning rate: 2e-5
- epoch: 1
max_steps: -1- 저장 간격: 1,000 steps
- checkpoint 보존: 최신 4개와 final model
학습 규모:
- 전체 row: 4,622,971
- 추정 token: 6.49B
- raw 예상 step: 12,384
- 실제 packed trainer step: 10,196
- 실제 train runtime: 약 9시간 38분
- 최종 train loss: 0.712
- 최종 weight 출처: 무결성 검사를 통과한
checkpoint-10196
한국어 법률 데이터 출처
법률 도메인 데이터 출처는 Legalize-KR 생태계와 로컬 한국 법률 corpus를 명시한다.
- Legalize-KR 조직: https://github.com/legalize-kr
- 대한민국 법령: https://github.com/legalize-kr/legalize-kr
- 대한민국 판례: https://github.com/legalize-kr/precedent-kr
- 대한민국 행정규칙: https://github.com/legalize-kr/admrule-kr
- 대한민국 자치법규: https://github.com/legalize-kr/ordinance-kr
- 수집/변환 파이프라인: https://github.com/legalize-kr/legalize-pipeline
- Legalize-KR 웹사이트: https://legalize.kr
- 원천 공공 법령 출처: https://www.law.go.kr
Legalize-KR은 법령/판례/행정규칙/자치법규를 Markdown과 Git 이력으로 관리하는 공개 프로젝트다. 조직 README 기준 원천 데이터는 국가법령정보센터 OpenAPI에서 가져오며, 파싱과 정규화 규칙이 개선되면 Git 이력이 재구성될 수 있으므로 장기 재현에는 snapshot 또는 release 고정이 필요하다.
출처 표기 방식:
- 법령:
legalize-kr/legalize-kr저장소,kr/{법령명}/{법령구분}.md경로,법령ID,법령MST,공포일자,시행일자,출처URL을 함께 적는다. - 판례:
legalize-kr/precedent-kr저장소,{사건종류}/{법원등급}/{법원명}_{선고일자}_{사건번호}.md경로,판례일련번호, 법원명, 선고일자, 사건번호를 함께 적는다. - 행정규칙:
legalize-kr/admrule-kr저장소,{기관경로}/{행정규칙종류}/{행정규칙명}/본문.md경로, 행정규칙일련번호 또는 발령번호를 함께 적는다. - 자치법규:
legalize-kr/ordinance-kr저장소,{광역}/{기초 또는 _본청 또는 _교육청}/{자치법규종류}/{자치법규명}/본문.md경로,자치법규ID,자치법규일련번호,공포일자,공포번호,출처URL을 함께 적는다. - commit hash만 장기 출처로 쓰지 않는다. Legalize-KR README는 파서/정규화 개선 시 저장소 history가 재구성될 수 있다고 안내한다.
- Legalize-KR README 기준 원문은 대한민국 정부 공공저작물이고, 저장소 구조와 메타데이터는 저장소별 MIT 표기를 따른다.
Evaluation Plan
Current vLLM Smoke Check
This is not a benchmark score. It verifies that both the base model and the CPT model load and generate with vLLM tensor parallelism.
- Date: 2026-06-28
- vLLM environment: local
.vllm-lfm-cu12, vLLM0.19.1, Torch2.10.0+cu128 - Tensor parallel size: 8
- Max model length: 8192
- Base model smoke: passed model load and generation
- CPT model smoke: passed model load and generation
- Smoke result path:
/home/work/.data/lfm2_ko_cpt/evals/20260628_1052_smoke_clean_vllm_smoke - CPT checks passed: Korean legal, Korean finance, tool-call format, English instruction smoke
- CPT wiki smoke note: the answer was relevant, but the simple keyword check expected the literal word
요약, so that specific automatic check is false.
Current vLLM Benchmark Results
Evaluation uses EleutherAI lm-evaluation-harness with vLLM tensor parallelism. The IFEval run below is the full 541-prompt public task, not a limited smoke sample.
- Date: 2026-06-28
- Task:
ifeval - Runner:
lm_eval==0.4.11,vllm==0.19.1, Torch2.10.0+cu128 - Tensor parallel size: 8
- Max model length: 8192
- Result path:
/home/work/.data/lfm2_ko_cpt/evals/20260628_022743_ifeval_full_vllm_vllm_matrix
| Metric | LiquidAI/LFM2.5-8B-A1B | LFM2.5-8B-A1B-KO-CPT-FULL | Delta | Relative |
|---|---|---|---|---|
| prompt_level_strict_acc | 0.2810 | 0.2976 | +0.0166 | +5.91% |
| prompt_level_loose_acc | 0.2921 | 0.3216 | +0.0295 | +10.10% |
| inst_level_strict_acc | 0.4221 | 0.4365 | +0.0144 | +3.41% |
| inst_level_loose_acc | 0.4341 | 0.4628 | +0.0287 | +6.61% |
GSM8K 5-shot LIMIT=200 limited regression check:
| Metric | LiquidAI/LFM2.5-8B-A1B | LFM2.5-8B-A1B-KO-CPT-FULL | Delta | Relative |
|---|---|---|---|---|
| exact_match strict-match | 0.2600 | 0.4250 | +0.1650 | +63.46% |
| exact_match flexible-extract | 0.4250 | 0.4950 | +0.0700 | +16.47% |
Global MMLU Korean LIMIT=500 limited check:
| Metric | LiquidAI/LFM2.5-8B-A1B | LFM2.5-8B-A1B-KO-CPT-FULL | Delta | Relative |
|---|---|---|---|---|
| global_mmlu_full_ko acc | 0.2803 | 0.3086 | +0.0283 | +10.10% |
| humanities acc | 0.2784 | 0.3022 | +0.0238 | +8.55% |
| other acc | 0.2914 | 0.3385 | +0.0471 | +16.16% |
| social_sciences acc | 0.2911 | 0.3404 | +0.0493 | +16.93% |
| stem acc | 0.2623 | 0.2591 | -0.0032 | -1.22% |
Note: GSM8K and Global MMLU Korean above are limited runs and should be treated as early regression checks, not final public benchmark scores. Additional vLLM evaluations are running with one task per GPU.
Additional vLLM checks:
| Task | Metric | LiquidAI/LFM2.5-8B-A1B | LFM2.5-8B-A1B-KO-CPT-FULL | Delta | Relative | Note |
|---|---|---|---|---|---|---|
arc_challenge LIMIT=500 |
acc | 0.3600 | 0.4020 | +0.0420 | +11.67% | limited |
arc_challenge LIMIT=500 |
acc_norm | 0.3760 | 0.4140 | +0.0380 | +10.11% | limited |
gsm8k full 5-shot |
exact_match strict | 0.2472 | 0.4617 | +0.2145 | +86.77% | full task |
gsm8k full 5-shot |
exact_match flexible | 0.4845 | 0.5701 | +0.0856 | +17.67% | full task |
mmlu_pro_economics LIMIT=500 |
exact_match | 0.4420 | 0.4900 | +0.0480 | +10.86% | limited |
mmlu_pro_law LIMIT=500 |
exact_match | 0.1840 | 0.1240 | -0.0600 | -32.61% | limited |
mmlu_prox_lite_ko LIMIT=500 |
exact_match | 0.2585 | 0.1667 | -0.0918 | -35.51% | limited |
global_mmlu_full_ko_professional_law full |
acc | 0.2581 | 0.2595 | +0.0014 | +0.54% | full subject |
global_mmlu_full_ko_professional_accounting full |
acc | 0.2730 | 0.2340 | -0.0390 | -14.29% | full subject |
global_mmlu_full_ko_high_school_macroeconomics full |
acc | 0.2436 | 0.2846 | +0.0410 | +16.83% | full subject |
global_mmlu_full_ko_virology full |
acc | 0.2831 | 0.3795 | +0.0964 | +34.05% | full subject |
global_mmlu_full_ko_world_religions full |
acc | 0.3450 | 0.4854 | +0.1404 | +40.70% | full subject |
hellaswag LIMIT=1000 |
acc | 0.4320 | 0.4430 | +0.0110 | +2.55% | limited |
hellaswag LIMIT=1000 |
acc_norm | 0.4330 | 0.5110 | +0.0780 | +18.01% | limited |
winogrande full |
acc | 0.5643 | 0.5699 | +0.0055 | +0.98% | full task |
piqa full |
acc | 0.7350 | 0.7541 | +0.0190 | +2.59% | full task |
piqa full |
acc_norm | 0.7209 | 0.7465 | +0.0256 | +3.55% | full task |
boolq full |
acc | 0.6544 | 0.7902 | +0.1358 | +20.75% | full task |
global_mmlu_full_ko_high_school_geography full |
acc | 0.3384 | 0.3434 | +0.0051 | +1.49% | full subject |
global_mmlu_full_ko_public_relations full |
acc | 0.2273 | 0.3000 | +0.0727 | +32.00% | full subject |
global_mmlu_full_ko_management full |
acc | 0.3107 | 0.4369 | +0.1262 | +40.63% | full subject |
global_mmlu_full_ko_human_sexuality full |
acc | 0.2672 | 0.3740 | +0.1069 | +40.00% | full subject |
global_mmlu_full_ko_international_law full |
acc | 0.3223 | 0.4215 | +0.0992 | +30.77% | full subject |
leaderboard_instruction_following / leaderboard_ifeval |
prompt_level_loose_acc | 0.2976 | 0.3346 | +0.0370 | +12.42% | lm-eval leaderboard task |
global_mmlu_full_ko_business_ethics full |
acc | 0.2100 | 0.4500 | +0.2400 | +114.29% | full subject |
global_mmlu_full_ko_sociology full |
acc | 0.2886 | 0.4776 | +0.1891 | +65.52% | full subject |
global_mmlu_full_ko_computer_security full |
acc | 0.2900 | 0.4500 | +0.1600 | +55.17% | full subject |
global_mmlu_full_ko_marketing full |
acc | 0.3590 | 0.5000 | +0.1410 | +39.29% | full subject |
global_mmlu_full_ko_professional_psychology full |
acc | 0.2729 | 0.3284 | +0.0556 | +20.36% | full subject |
global_mmlu_full_ko_college_biology full |
acc | 0.2569 | 0.3333 | +0.0764 | +29.73% | full subject |
kmmlu_hard_humss LIMIT=1000 |
acc | 0.2533 | 0.2675 | +0.0143 | +5.63% | limited |
kmmlu_hard LIMIT=1000 |
acc | 0.2015 | 0.1720 | -0.0295 | -14.63% | limited |
kmmlu_hard_stem LIMIT=1000 |
acc | 0.1973 | 0.1564 | -0.0409 | -20.74% | limited |
Latest Global MMLU Korean subject sweep:
| Task | Metric | LiquidAI/LFM2.5-8B-A1B | LFM2.5-8B-A1B-KO-CPT-FULL | Delta | Relative |
|---|---|---|---|---|---|
global_mmlu_full_ko_astronomy |
acc | 0.3421 | 0.2829 | -0.0592 | -17.31% |
global_mmlu_full_ko_conceptual_physics |
acc | 0.3149 | 0.2936 | -0.0213 | -6.76% |
global_mmlu_full_ko_econometrics |
acc | 0.2632 | 0.2807 | +0.0175 | +6.67% |
global_mmlu_full_ko_electrical_engineering |
acc | 0.2759 | 0.3103 | +0.0345 | +12.50% |
global_mmlu_full_ko_formal_logic |
acc | 0.3254 | 0.2778 | -0.0476 | -14.63% |
global_mmlu_full_ko_high_school_biology |
acc | 0.2710 | 0.2871 | +0.0161 | +5.95% |
global_mmlu_full_ko_high_school_chemistry |
acc | 0.2315 | 0.1921 | -0.0394 | -17.02% |
global_mmlu_full_ko_high_school_statistics |
acc | 0.2870 | 0.1574 | -0.1296 | -45.16% |
global_mmlu_full_ko_high_school_european_history |
acc | 0.2788 | 0.3152 | +0.0364 | +13.04% |
global_mmlu_full_ko_high_school_world_history |
acc | 0.2911 | 0.3376 | +0.0464 | +15.94% |
global_mmlu_full_ko_jurisprudence |
acc | 0.2870 | 0.2685 | -0.0185 | -6.45% |
global_mmlu_full_ko_logical_fallacies |
acc | 0.3067 | 0.2945 | -0.0123 | -4.00% |
The limited checks are useful for regression tracking, but they should not be read as final leaderboard-quality numbers. The model improves strongly on several reasoning and instruction-following checks, while law-focused MMLU-Pro and MMLU-ProX-lite-ko need targeted remediation.
KMMLU direct exact-match runs currently show near-zero base scores and small non-zero CPT scores. Treat those as prompt/extraction diagnostics rather than quality benchmarks until the direct-answer parser is fixed.
Recommended Next Post-Training
The next stage should be targeted post-training, not another broad CPT-only pass. Liquid's official LFM2.5-8B-A1B reporting emphasizes instruction following, math, tool use, and agentic workflows such as IFEval, IFBench, Multi-IF, MATH500, AIME, BFCL, and Tau2. The Japanese LFM2.5-1.2B-JP model card reports language-specialized axes such as JMMLU-ProX, JMMLU, J-MIFEval, J-GSM8K, J-MATH500, JHumanEval+, and J-BFCLv3. The Korean follow-up should mirror those axes with Korean data and Korean eval gates.
Priority plan:
- Korean MCQA remediation SFT: KMMLU, KMMLU-hard, MMLU-ProX-lite-ko style, legal/accounting/finance questions, exact option-label outputs, and short rationales.
- STEM/legal/accounting remediation SFT: target current regressions in high-school statistics, astronomy, chemistry, formal logic, jurisprudence, professional accounting, and MMLU-Pro law.
- Korean instruction-following SFT: Ko-IFEval-style constraints, Korean formatting, uncertainty, refusal, and multi-condition prompts.
- Tool and agent SFT: Korean BFCL-style tool schemas, terminal/tool-call traces, JSON validity, and multi-turn task completion.
- Preference tuning: DPO/ORPO/KTO on current failure pairs. Reward correct option extraction, concise Korean, valid tools, and uncertainty over hallucination.
- Small preservation mix: keep GSM8K, ARC, BoolQ, IFEval, and high-gain Global MMLU KO examples in the mix so post-training does not erase current gains.
Concrete gates for the next model:
kmmlu_hard: improve from 0.1720 to at least 0.2300.kmmlu_hard_stem: improve from 0.1564 to at least 0.2100.mmlu_prox_lite_ko: recover above the base score 0.2585 and target 0.3000.mmlu_pro_law: recover above the base score 0.1840 and target 0.2300.- Preserve
gsm8kflexible exact match at or above 0.5701. - Preserve
boolqaccuracy at or above 0.7902. - Preserve
leaderboard_instruction_followingprompt loose at or above 0.3346.
Public Benchmark Plan
Primary public Korean benchmarks:
- KMMLU: https://huggingface.co/datasets/HAERAE-HUB/KMMLU
- KMMLU-Pro: https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro
- KMMLU-Redux: https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux
- Ko-IFEval: https://huggingface.co/datasets/davidkim205/ko-ifeval
Secondary checks:
- Korean legal RAG holdout
- Korean finance explanation holdout
- Korean wiki QA/summarization holdout
- Terminal/tool-use smoke tests
Benchmark results will be added after vLLM base-vs-CPT evaluation.
- Downloads last month
- 69