Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

KEval-7b

keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.

The current model is private.

Now that the new version(keval-9b) has been released, the previous version will be changed so that anyone can use it.

Evaluation

model acc wrong diff-0 diff-1 diff-2 diff-3 diff-4 diff-5 diff-6 diff-7 diff-8 diff-9 length
0 Meta-Llama-3-8B-Instruct-keval_datasets_small.jsonl 0.45 0.38 0.06 0.18 0.17 0.09 0.05 0.04 0.02 0 0 0 100
1 Mistral-7B-Instruct-v0.2-keval_datasets_small.jsonl 0.55 0.27 0.18 0.2 0.09 0.06 0.05 0.03 0.04 0.04 0.04 0 100
2 Mistral-7B-Instruct-v0.3-keval_datasets_small.jsonl 0.71 0.05 0.26 0.26 0.13 0.08 0.07 0.04 0.03 0.03 0.04 0 100
3 aya-23-8B-keval_datasets_small.jsonl 0.7 0.02 0.17 0.24 0.16 0.1 0.13 0.06 0.06 0.04 0.01 0 100
4 gemma-2-27b-it-keval_datasets_small.jsonl 0.76 0.11 0.2 0.35 0.18 0.1 0.03 0.01 0.02 0 0 0 100
5 gemma-2-9b-it-keval_datasets_small.jsonl 0.83 0.04 0.26 0.42 0.15 0.05 0.02 0.05 0.01 0 0 0 100
6 keval-7b-keval_datasets_small.jsonl 0.84 0 0.28 0.41 0.11 0.06 0.05 0.03 0.02 0.03 0.01 0 100
7 keval-9b-keval_datasets_small.jsonl 0.91 0 0.43 0.38 0.1 0.05 0.03 0.01 0 0 0 0 100
Downloads last month
356
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including davidkim205/keval-7b