davidkim205 commited on
Commit
9fa844e
1 Parent(s): a63fc36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -37,20 +37,26 @@ korean multi-task instruction dataset
37
  - CUDA Version: 12.2
38
 
39
  ## Training
40
- Refer github
41
 
42
  ## Evaluation
43
 
44
  For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
45
 
46
- | model | score | average score | % |
47
- | ------------------------------ | ------- |---------------|------------|
48
- | gpt-3.5-turbo | 147 | 3.97 | 79.45% |
49
- | WizardLM-13B-V1.2 | 96 | 2.59 | 51.89% |
50
- | Llama-2-7b-chat-hf | 67 | 1.81 | 36.21% |
51
- | Llama-2-13b-chat-hf | 73 | 1.91 | 38.37% |
52
- | **komt-llama2-7b-v1 (ours)** | **117** | **3.16** | **63.24%** |
53
- | **komt-llama2-13b-v1 (ours)** | **129** | **3.48** | **69.72%** |
 
 
 
 
 
 
54
 
55
  ------------------------------------------------
56
  # Original model card: Meta's Llama 2 7B-chat
 
37
  - CUDA Version: 12.2
38
 
39
  ## Training
40
+ Refer https://github.com/davidkim205/komt
41
 
42
  ## Evaluation
43
 
44
  For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
45
 
46
+ | model | score | average(0~5) | percentage |
47
+ | --------------------------------------- | ------- | ------------ | ---------- |
48
+ | gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
49
+ | naver Cue(close) | 140 | 3.78 | 75.67% |
50
+ | clova X(close) | 136 | 3.67 | 73.51% |
51
+ | WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
52
+ | Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
53
+ | Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
54
+ | nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
55
+ | kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
56
+ | beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
57
+ | **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
58
+ | **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
59
+
60
 
61
  ------------------------------------------------
62
  # Original model card: Meta's Llama 2 7B-chat