stjohn2007 commited on
Commit
33413d0
1 Parent(s): 0a9104c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -5
README.md CHANGED
@@ -32,27 +32,41 @@ This repository provides large language models developed by [TokyoTech-LLM](http
32
 
33
  ### MT-Bench JA
34
 
35
- * We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
36
- * We will add the scores of existing models soon.
37
 
38
- #### Overall
 
 
39
 
40
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
41
  |---|---|---|---|---|---|---|---|---|---|
42
  | Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
43
 
44
- #### First Turn
45
 
46
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
47
  |---|---|---|---|---|---|---|---|---|---|
48
  | Swallow-MS-7b-instruct-v0.1 |0.3699|0.4880|0.4260|0.3900|0.1080|0.2364|0.3780|0.4500|0.4800|
49
 
50
- #### Second Turn
51
 
52
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
53
  |---|---|---|---|---|---|---|---|---|---|
54
  | Swallow-MS-7b-instruct-v0.1 |0.3130|0.2624|0.4320|0.2996|0.1000|0.2430|0.3564|0.3291|0.4700|
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ## Evaluation Benchmarks
58
 
 
32
 
33
  ### MT-Bench JA
34
 
35
+ #### Turn-Wise Performance
 
36
 
37
+ We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
38
+
39
+ ##### Overall
40
 
41
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
42
  |---|---|---|---|---|---|---|---|---|---|
43
  | Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
44
 
45
+ ##### First Turn
46
 
47
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
48
  |---|---|---|---|---|---|---|---|---|---|
49
  | Swallow-MS-7b-instruct-v0.1 |0.3699|0.4880|0.4260|0.3900|0.1080|0.2364|0.3780|0.4500|0.4800|
50
 
51
+ ##### Second Turn
52
 
53
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
54
  |---|---|---|---|---|---|---|---|---|---|
55
  | Swallow-MS-7b-instruct-v0.1 |0.3130|0.2624|0.4320|0.2996|0.1000|0.2430|0.3564|0.3291|0.4700|
56
 
57
+ #### Comparison to the past model
58
+
59
+ We only provide the overall score in this section.
60
+
61
+ |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
62
+ |---|---|---|---|---|---|---|---|---|---|
63
+ | Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
64
+ | ELYZA-japanese-Llama-2-7b-fast-instruct |0.2827|0.3289|0.3907|0.2424|0.1480|0.1584|0.3511|0.3053|0.3365|
65
+ | calm2-7b-chat |0.3204|0.4657|0.4898|0.1837|0.1005|0.1414|0.3927|0.3601|0.4293|
66
+ | calm2-7b-chat-dpo-experimental |0.3493|0.5312|0.5237|0.1857|0.1000|0.1813|0.3355|0.4320|0.5051|
67
+ | RakutenAI-7B-instruct |0.2994|0.3623|0.3711|0.3333|0.1763|0.1581|0.4215|0.2824|0.2901|
68
+ | RakutenAI-7B-chat |0.3667|0.4229|0.4644|0.3990|0.2161|0.2390|0.3416|0.3904|0.4601|
69
+
70
 
71
  ## Evaluation Benchmarks
72