viethoangtranduong commited on
Commit
7b7b07b
1 Parent(s): 5db967d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -44,7 +44,7 @@ On [**Alpaca-Eval 2.0**](https://tatsu-lab.github.io/alpaca_eval/):
44
  - The base model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) scored **14.72**.
45
  After applying the above methodology:
46
  - This model scored **30.2** - ranked 3rd and the highest for an open-source base model at the time of publication.
47
- - Utilizing the model with PairRM, which involved generating 16 responses and submitting the highest-scoring response by PairRM, we scored **34.86** - ranked 2nd.
48
  The best model on the leaderboard is "gpt-4-turbo", which is also the judge of optimal responses.
49
 
50
  We recognize that the Alpaca-Eval 2.0 benchmark does not entirely capture the full range of capabilities and performances of LLMs.
 
44
  - The base model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) scored **14.72**.
45
  After applying the above methodology:
46
  - This model scored **30.2** - ranked 3rd and the highest for an open-source base model at the time of publication.
47
+ - When post-processing the model outputs with PairRM-best-of-16, which involved generating 16 responses and select the highest-scoring response by PairRM, we scored **34.86** - ranked 2nd.
48
  The best model on the leaderboard is "gpt-4-turbo", which is also the judge of optimal responses.
49
 
50
  We recognize that the Alpaca-Eval 2.0 benchmark does not entirely capture the full range of capabilities and performances of LLMs.