Edit model card

EEVE-Instruct-Math-10.8B

EEVE-Math ํ”„๋กœ์ ํŠธ๋Š”

์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ํฌ๊ด„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ EEVE-Math์™€ EEVE-Instruct์˜ dare-ties๋กœ ๋ณ‘ํ•ฉํ•œ ๋ณ‘ํ•ฉ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ๋Š” ์ด๋Ÿฐ ๊ณผ์ •์„ ํ†ตํ•ด ํŠนํ™” ๋ชจ๋ธ์˜ EEVE-Math์˜ ์„ฑ๋Šฅ์„ ๋งŽ์ด ์žƒ์ง€ ์•Š๊ณ  Instruct ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์„ฑ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋Š” Proof of concept์˜ ์„ฑ๊ฒฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Model gsm8k-ko(pass@1)
EEVE(Base) 0.4049
EEVE-Math (epoch 1) 0.508
EEVE-Math (epoch 2) 0.539
EEVE-Instruct 0.4511
EEVE-Instruct + Math 0.4845

Merge Details

This model was merged using the DARE TIES merge method using yanolja/EEVE-Korean-Instruct-10.8B-v1.0 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: yanolja/EEVE-Korean-10.8B-v1.0
    # no parameters necessary for base model
  - model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
    parameters:
      density: 0.53
      weight: 0.6
  - model: kuotient/EEVE-Math-10.8B
    parameters:
      density: 0.53
      weight: 0.4
merge_method: dare_ties
base_model: yanolja/EEVE-Korean-10.8B-v1.0
parameters:
  int8_mask: true
dtype: bfloat16

Evaluation

gsm8k-ko, kobest

git clone https://github.com/kuotient/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .
lm_eval --model hf \
    --model_args pretrained=yanolja/EEVE-Korean-Instruct-2.8B-v1.0 \
    --tasks gsm8k-ko \
    --device cuda:0 \
    --batch_size auto:4
Model gsm8k(pass@1) boolq(acc) copa(acc) hellaswag(acc) Overall
yanolja/EEVE-Korean-10.8B-v1.0 0.4049 - - - -
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 0.4511 0.8668 0.7450 0.4940 0.6392
EEVE-Math-10.8B 0.5390 0.8027 0.7260 0.4760 0.6359
EEVE-Instruct-Math-10.8B 0.4845 0.8519 0.7410 0.4980 0.6439
Downloads last month
5
Safetensors
Model size
10.8B params
Tensor type
BF16
ยท
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of

Evaluation results