metadata

library_name: transformers
tags: []

Model Card for TwinDoc/RedWhale-2-12B-Instruct

사전학습 모델인 TwinDoc/RedWhale-2-12B를 SFT(Supervised Finetuning)한 모델입니다. SFT는 ContextQA 및 요약 task에 맞춰 학습하였습니다.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: AgileSoda
Model type: Llama
Language(s) (NLP): 한국어
License: [More Information Needed]
Foundation Model: RedWhale-2-12B

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

RedWhale-2-12B-Instruct 모델 사용 방법은 meta-llama/Llama-3.1-8B-Instruct 모델 사용 방법과 동일합니다.
사용하고자 하는 서빙 엔진의 공식 문서를 참고하세요. 다음은 예시입니다.

usage with Transformers
예시 코드는 transformers == 4.48.1에서 작성되었습니다.

from transformers import AutoModelForCausalLM,AutoTokenizer
import torch

loading_args = {"torch_dtype": torch.bfloat16, "device_map": "auto"} ## for multi gpu loading
model = AutoModelForCausalLM.from_pretrained("TwinDoc/RedWhale-2-12B-Instruct",**loading_args)
tokenizer = AutoTokenizer.from_pretrained("TwinDoc/RedWhale-2-12B-Instruct")

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "대한민국의 수도는?"},
]

inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt")
outputs = model.generate(inputs)

>>> print(outputs)
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

대한민국의 수도는?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

대한민국의 수도는 서울입니다.<|eot_id|>"

usage with vllm
예시 코드는 vllm == 0.6.6에서 작성되었습니다.

from vllm import LLM, SamplingParams
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"  # Arrange GPU devices starting from 0
os.environ["CUDA_VISIBLE_DEVICES"]= "0,1,2,3,4,5,6,7" 

repo_id = "TwinDoc/RedWhale-2-12B-Instruct"
tensor_parallel_size = 8 ## num of gpus

llm = LLM(
    model=repo_id,
    tensor_parallel_size=tensor_parallel_size,
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "대한민국의 수도는?"},
]

sampling_params = SamplingParams(
    temperature=0.8,
    top_p=0.9,
    max_tokens = 8192,
)

outputs = llm.chat(messages,sampling_params)

>>> print(outputs[0].outputs[0].text)
대한민국의 수도는 서울입니다.

Training Details

Training Data

Evaluation

Testing Data, Factors & Metrics

Testing Data

allganize/rag-ko test셋 200건
미래에셋 Context QA 100건
AIA Context QA 140건
BNK Context QA 63건

Metrics

LLM as a Judge을 활용하여 성능을 측정하였습니다. Prompt,측정 모델 그리고 평가 결과는 Our Leaderboard를 참고하세요. Our Leaderboard의 모델명 "RedWhale2 12B 0.98 SFT v4 M"이 "TwinDoc/RedWhale-2-12B-Instruct" 모델입니다.