Edit model card

Hyperparameters:

  • Batch: 128
  • Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
  • Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
  • Epoch: 2 (main revision은 1 epoch)
  • ν•™μŠ΅ report

Performance

Dataset Accuracy (epoch=1)
hh-rlhf-ko 59.02
hh-rlhf-ko (helpful) 64.72
hh-rlhf-ko (harmless) 44.29
ko-skku-rlhf 68.69
PKU-SafeRLHF-ko (safer) 64.09
kor-ethical-qa 99.8
ko-ultrafeedback-binarized 74.96
Average 64.71

Usage

  • κΈ°μ‘΄ 42dot SFT λͺ¨λΈμ˜ λŒ€ν™” ν…œν”Œλ¦Ώμ„ μ‚¬μš©.
  • μ‚¬μš©μžμ˜ λ°œν™”λŠ” <user>:\n둜 μ‹œμž‘
  • Bot의 λ°œν™”λŠ” <bot>:\n으둜 μ‹œμž‘
from transformers import pipeline

pipe = pipeline("text-classification", model="heegyu/ko-reward-model-1.3b-v0.1")

pipe("""<human>:
κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
<bot>:
μ‹«μ–΄μš”<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.040634412318468094}]

pipe("""<human>:
κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
<bot>:
κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
μ§€ν•˜μ²  3ν˜Έμ„  κ²½λ³΅κΆμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 6번 좜ꡬ둜 λ‚˜μ™€ 정뢀쀑앙청사, κ΄‘ν™”λ¬Έ λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
μ§€ν•˜μ²  5ν˜Έμ„  κ΄‘ν™”λ¬Έμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ ν•΄μΉ˜λ§ˆλ‹Ή μ—°κ²°ν†΅λ‘œλ₯Ό μ΄μš©ν•΄ 7번 좜ꡬ둜 λ‚˜μ™€ κ΄‘μž₯숲 λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
μ§€ν•˜μ²  1ν˜Έμ„  μ‹œμ²­μ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 3번 좜ꡬ둜 λ‚˜μ™€ λ•μˆ˜κΆμ„ μ§€λ‚˜ μ½”λ¦¬μ•„λ‚˜ ν˜Έν…” λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
λ„λ³΄λ‘œ 2λΆ„ 거리에 μžˆλŠ” 쒅각역을 μ΄μš©ν•©λ‹ˆλ‹€.
κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” λ²„μŠ€ 노선은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: 272λ²ˆγ€401λ²ˆγ€406λ²ˆγ€704λ²ˆγ€7022번

도움이 λ˜μ…¨μœΌλ©΄ μ’‹κ² μŠ΅λ‹ˆλ‹€!<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.2885928750038147}]


pipe("""<human>:
λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?

<bot>:
μ €λ ΄ν•˜κ²Œ ꡬ할 수 μžˆλŠ” 곳을 μ•ˆλ‚΄ν•΄λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.25814417004585266}]


pipe("""<human>:
λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?

<bot>:
λ§ˆμ•½μ€ 쀑독, 건강 문제, 법적 문제λ₯Ό μ΄ˆλž˜ν•˜μ—¬ μ‹¬κ°ν•œ μœ„ν—˜μ„±μ„ λ‚΄ν¬ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.8125637173652649}]
Downloads last month
3
Safetensors
Model size
1.34B params
Tensor type
F32
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train heegyu/ko-reward-model-1.3b-v0.1

Collection including heegyu/ko-reward-model-1.3b-v0.1