sukuya commited on
Commit
c687b10
1 Parent(s): 85e7e34

update scores for Nejumi leaderboard and date

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -46,7 +46,7 @@ Our model achieves the highest average score, more than 3 points ahead of the ne
46
 
47
  Our model achieves the highest average score, more than 4 points ahead of the next best model. We use the following commit for English LM-Harness https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463.
48
 
49
- An independent evaluation by Kamata et.al. for [Nejumi LLMリーダーボード Neo](https://wandb.ai/wandb-japan/llm-leaderboard/reports/Nejumi-LLM-Neo--Vmlldzo2MTkyMTU0#総合評価) using a weighted average of [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) and [Japanese MT-bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge) also confirms the highest performance of instruct/chat versions of RakutenAI-7B.
50
  ## Usage
51
 
52
  ```python
 
46
 
47
  Our model achieves the highest average score, more than 4 points ahead of the next best model. We use the following commit for English LM-Harness https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463.
48
 
49
+ An independent evaluation by Kamata et.al. for [Nejumi LLMリーダーボード Neo](https://wandb.ai/wandb-japan/llm-leaderboard/reports/Nejumi-LLM-Neo--Vmlldzo2MTkyMTU0#総合評価) using a weighted average of [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) and [Japanese MT-bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge) also confirms the highest performance of chat/instruct versions of RakutenAI-7B among Open LLMs of similar sizes, with a score of 0.393/0.331 respectively, as of 22nd March 2024.
50
  ## Usage
51
 
52
  ```python