Taishi-N324 commited on
Commit
00d6595
1 Parent(s): f178cbc

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -4
README.md CHANGED
@@ -34,8 +34,7 @@ We released the 7B and 70B models without vocabulary expansion on January 26th,
34
  ![logo](./logo.png)
35
 
36
  This repository provides large language models developed by [TokyoTech-LLM](https://tokyotech-llm.github.io/).
37
- Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or our paper (preprint coming soon) for more details!
38
-
39
 
40
  ## Model Details
41
 
@@ -47,7 +46,7 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
47
 
48
  ## Base Model Performance
49
 
50
- ### Japanese version
51
 
52
  |Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|
53
  |---|---|---|---|---|---|---|---|---|---|
@@ -62,7 +61,7 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
62
  | Llama 2 | 70B | 0.8686 | 0.4656 | 0.5256 | 0.9080 | 0.2361 | 0.3560 | 0.2643 | **0.2398** |
63
  | Swallow | 70B | 0.9348 | **0.6290** | 0.6960 | 0.9176 | 0.2266 | **0.4840** | **0.3043** | 0.2298 |
64
  | Swallow-NVE | 70B | **0.9410** | 0.5759 | **0.7024** | **0.9254** | **0.2758** | 0.4720 | 0.3042 | 0.2322 |
65
- ### English version
66
 
67
  |Model|Size|OpenBookQA|TriviaQA|HellaSwag|SQuAD2.0|XWINO|GSM8K|
68
  |---|---|---|---|---|---|---|---|
@@ -78,6 +77,33 @@ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or o
78
  | Swallow | 70B | 0.4220 | 0.7756 | 0.6458 | 0.3745 | 0.9204 | 0.4867 |
79
  | Swallow-NVE | 70B | 0.4240 | 0.7817 | 0.6439 | 0.3451 | 0.9256 | 0.4943 |
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ## Usage
82
 
83
  First install additional dependencies in [requirements.txt](./requirements.txt):
 
34
  ![logo](./logo.png)
35
 
36
  This repository provides large language models developed by [TokyoTech-LLM](https://tokyotech-llm.github.io/).
37
+ Read our [blog post](https://zenn.dev/tokyotech_lm/articles/d6cb3a8fdfc907) or our [paper](https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/A8-5.pdf)
 
38
 
39
  ## Model Details
40
 
 
46
 
47
  ## Base Model Performance
48
 
49
+ ### Japanese tasks
50
 
51
  |Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|
52
  |---|---|---|---|---|---|---|---|---|---|
 
61
  | Llama 2 | 70B | 0.8686 | 0.4656 | 0.5256 | 0.9080 | 0.2361 | 0.3560 | 0.2643 | **0.2398** |
62
  | Swallow | 70B | 0.9348 | **0.6290** | 0.6960 | 0.9176 | 0.2266 | **0.4840** | **0.3043** | 0.2298 |
63
  | Swallow-NVE | 70B | **0.9410** | 0.5759 | **0.7024** | **0.9254** | **0.2758** | 0.4720 | 0.3042 | 0.2322 |
64
+ ### English tasks
65
 
66
  |Model|Size|OpenBookQA|TriviaQA|HellaSwag|SQuAD2.0|XWINO|GSM8K|
67
  |---|---|---|---|---|---|---|---|
 
77
  | Swallow | 70B | 0.4220 | 0.7756 | 0.6458 | 0.3745 | 0.9204 | 0.4867 |
78
  | Swallow-NVE | 70B | 0.4240 | 0.7817 | 0.6439 | 0.3451 | 0.9256 | 0.4943 |
79
 
80
+ ## Evaluation Benchmarks
81
+
82
+ ### Japanese evaluation benchmarks
83
+
84
+ We used llm-jp-eval(v1.0.0) and JP Language Model Evaluation Harness(commit #9b42d41). The details are as follows:
85
+
86
+ - Multiple-choice question answering (JCommonsenseQA [Kurihara+, 2022])
87
+ - Open-ended question answering (JEMHopQA [Ishii+, 2023])
88
+ - Open-ended question answering (NIILC [Sekine, 2003])
89
+ - Machine reading comprehension (JSQuAD [Kurihara+, 2022])
90
+ - Automatic summarization (XL-Sum [Hasan+, 2021])
91
+ - Machine translation (WMT2020 ja-en [Barrault+, 2020])
92
+ - Machine translation (WMT2020 en-ja [Barrault+, 2020])
93
+ - Mathematical reasoning (MGSM [Shi+, 2023])
94
+
95
+ ### English evaluation benchmarks
96
+
97
+ We used the Language Model Evaluation Harness(v.0.3.0). The details are as follows:
98
+
99
+ - Multiple-choice question answering (OpenBookQA [Mihaylov+, 2018])
100
+ - Open-ended question answering (TriviaQA [Joshi+, 2017])
101
+ - Machine reading comprehension (SQuAD 2.0 [Rajpurkar+, 2018])
102
+ - Commonsense reasoning (XWINO [Tikhonov & Ryabinin, 2021])
103
+ - Natural language inference (HellaSwag [Zellers+, 2019])
104
+ - Mathematical reasoning (GSM8k [Cobbe+, 2021])
105
+
106
+
107
  ## Usage
108
 
109
  First install additional dependencies in [requirements.txt](./requirements.txt):