Dongfu Jiang
commited on
Commit
•
ab31e5e
1
Parent(s):
cdcabe6
Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,30 @@ which is trained on [mixinstruct](https://huggingface.co/datasets/llm-blender/mi
|
|
27 |
- Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
|
28 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
## Usage Example
|
31 |
Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
|
32 |
Otherwise, loading it directly with hugging face `from_pretrained()` API will encounter errors.
|
|
|
27 |
- Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
|
28 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
29 |
|
30 |
+
|
31 |
+
| **Methods** | BERTScore | BARTScore | BLEURT | GPT-Rank | Beat Vic(%) | Beat OA(%) | Top-1(%) | Top-2(%) | Top-3(%) |
|
32 |
+
|:-----------------:|:---------:|:---------:|:---------:|:--------:|:----------:|:----------:|:----------:|:----------:|:----------:|
|
33 |
+
| Open Assistant | **74.68** | -3.45 | **-0.39** | **3.90** | **62.78** | N/A | 17.35 | 35.67 | 51.98 |
|
34 |
+
| Vicuna | 69.60 | **-3.44** | -0.61 | 4.13 | N/A | **64.77** | **25.47** | **41.23** | **52.88** |
|
35 |
+
| Alpaca | 71.46 | -3.57 | -0.53 | 4.62 | 56.70 | 61.35 | 15.41 | 29.81 | 44.46 |
|
36 |
+
| Baize | 65.57 | -3.53 | -0.66 | 4.86 | 52.76 | 56.40 | 14.23 | 26.91 | 38.80 |
|
37 |
+
| moss | 64.85 | -3.65 | -0.73 | 5.09 | 51.62 | 51.79 | 15.93 | 27.52 | 38.27 |
|
38 |
+
| ChatGLM | 70.38 | -3.52 | -0.62 | 5.63 | 44.04 | 45.67 | 9.41 | 19.37 | 28.78 |
|
39 |
+
| Koala | 63.96 | -3.85 | -0.84 | 6.76 | 39.93 | 39.01 | 8.15 | 15.72 | 22.55 |
|
40 |
+
| Dolly v2 | 62.26 | -3.83 | -0.87 | 6.90 | 33.33 | 31.44 | 5.16 | 10.06 | 16.45 |
|
41 |
+
| Mosaic MPT | 63.21 | -3.72 | -0.82 | 7.19 | 30.87 | 30.16 | 5.39 | 10.61 | 16.24 |
|
42 |
+
| StableLM | 62.47 | -4.12 | -0.98 | 8.71 | 21.55 | 19.87 | 2.33 | 4.74 | 7.96 |
|
43 |
+
| Flan-T5 | 64.92 | -4.57 | -1.23 | 8.81 | 23.89 | 19.93 | 1.30 | 2.87 | 5.32 |
|
44 |
+
| Oracle(BERTScore) | **77.67** | -3.17 | -0.27 | 3.88 | 54.41 | 38.84 | 20.16 | 38.11 | 53.49 |
|
45 |
+
| Oracle(BLEURT) | 75.02 | -3.15 | **-0.15** | 3.77 | 55.61 | 45.80 | 21.48 | 39.84 | 55.36 |
|
46 |
+
| Oracle(BARTScore) | 73.23 | **-2.87** | -0.38 | 3.69 | 50.32 | 57.01 | 26.10 | 43.70 | 57.33 |
|
47 |
+
| Oracle(ChatGPT) | 70.32 | -3.33 | -0.51 | **1.00** | **100.00** | **100.00** | **100.00** | **100.00** | **100.00** |
|
48 |
+
| Random | 66.36 | -3.76 | -0.77 | 6.14 | 37.75 | 36.91 | 11.28 | 20.69 | 29.05 |
|
49 |
+
| MLM-Scoring | 64.77 | -4.03 | -0.88 | 7.00 | 33.87 | 30.39 | 7.29 | 14.09 | 21.46 |
|
50 |
+
| SimCLS | **73.14** | -3.22 | -0.38 | 3.50 | 52.11 | 49.93 | 26.72 | 46.24 | 60.72 |
|
51 |
+
| SummaReranker | 71.60 | -3.25 | -0.41 | 3.66 | **55.63** | 48.46 | 23.89 | 42.44 | 57.54 |
|
52 |
+
| [PairRanker](https://huggingface.co/llm-blender/pair-ranker) (This model) | 72.97 | **-3.14** | **-0.37** | **3.20** | 54.76 | **57.79** | **30.08** | **50.68** | **65.12** |
|
53 |
+
|
54 |
## Usage Example
|
55 |
Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
|
56 |
Otherwise, loading it directly with hugging face `from_pretrained()` API will encounter errors.
|