Edit model card

image/png

T3Q-LLM-sft1.0-dpo1.0

This model is a version of T3Q-LLM/T3Q-LLM-solar10.8-sft-v1.0 that has been fine-tuned with DPO.

Model Developers Chihoon Lee(chihoonlee10), T3Q

Prompt Template

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: {prompt}
Assistant:

How to Use it

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0")
tokenizer = AutoTokenizer.from_pretrained("T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0")

prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"
text = 'ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”? μ•„λž˜ 선택지 쀑 κ³¨λΌμ£Όμ„Έμš”.\n\n(A) κ²½μ„±\n(B) λΆ€μ‚°\n(C) 평양\n(D) μ„œμšΈ\n(E) μ „μ£Ό'
model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt')

outputs = model.generate(**model_inputs, max_new_tokens=256)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(output_text)

Example Output

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”? μ•„λž˜ 선택지 쀑 κ³¨λΌμ£Όμ„Έμš”.

(A) κ²½μ„±
(B) λΆ€μ‚°
(C) 평양
(D) μ„œμšΈ
(E) μ „μ£Ό
Assistant:
(D) μ„œμšΈμ΄ ν•œκ΅­μ˜ μˆ˜λ„μž…λ‹ˆλ‹€. μ„œμšΈμ€ λ‚˜λΌμ˜ 뢁동뢀에 μœ„μΉ˜ν•΄ 있으며, μ •μΉ˜, 경제, λ¬Έν™”μ˜ μ€‘μ‹¬μ§€μž…λ‹ˆλ‹€. μ•½ 1,000만 λͺ…이 λ„˜λŠ” 인ꡬλ₯Ό 가진 μ„Έκ³„μ—μ„œ κ°€μž₯ 큰 λ„μ‹œ 쀑 ν•˜λ‚˜μž…λ‹ˆλ‹€. μ„œμšΈμ€ 높은 λΉŒλ”©, ν˜„λŒ€μ μΈ 인프라, ν™œκΈ° λ¬Έν™” μž₯면으둜 유λͺ…ν•©λ‹ˆλ‹€. λ˜ν•œ, λ§Žμ€ 역사적 λͺ…μ†Œμ™€ 박물관이 μžˆμ–΄ λ°©λ¬Έκ°λ“€μ—κ²Œ ν’λΆ€ν•œ λ¬Έν™” μ²΄ν—˜μ„ μ œκ³΅ν•©λ‹ˆλ‹€.
Task Version Metric Value Stderr
kobest_boolq 0 acc 0.9387 Β± 0.0064
macro_f1 0.9387 Β± 0.0064
kobest_copa 0 acc 0.7590 Β± 0.0135
macro_f1 0.7585 Β± 0.0135
kobest_hellaswag 0 acc 0.5080 Β± 0.0224
acc_norm 0.5580 Β± 0.0222
macro_f1 0.5049 Β± 0.0224
kobest_sentineg 0 acc 0.8489 Β± 0.0180
macro_f1 0.8483 Β± 0.0180
Downloads last month
8,330
Safetensors
Model size
10.8B params
Tensor type
BF16
Β·

Dataset used to train T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0