EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval

About the Model

This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.

The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.

Prompt Template

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜. 

### ์งˆ๋ฌธ: 
{question}

### ์ •๋ณด: 
{context}

### ํ‰๊ฐ€: 

How to Use it

import torch
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)

prompt_template = '์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.\n์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.\n\n### ์งˆ๋ฌธ:\n{question}\n\n### ์ •๋ณด:\n{context}\n\n### ํ‰๊ฐ€:\n'
query = {
    "question": "๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?",
    "context": "์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค."
}

model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)

Example Output

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.

### ์งˆ๋ฌธ:
๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?

### ์ •๋ณด:
์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค.

### ํ‰๊ฐ€:
์˜ˆ<|end_of_text|>

Training Data

Metrics

Korean LLM Benchmark

Model Average Ko-ARC Ko-HellaSwag Ko-MMLU Ko-TruthfulQA Ko-CommonGen V2
EEVE-Korean-Instruct-10.8B-v1.0 56.08 55.2 66.11 56.48 49.14 53.48
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 56.1 55.55 65.95 56.24 48.66 54.07

Generated Dataset

Model Accuracy F1 Precision Recall
EEVE-Korean-Instruct-10.8B-v1.0 0.824 0.800 0.885 0.697
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 0.892 0.875 0.903 0.848
Downloads last month
846
Safetensors
Model size
10.8B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 1