llm-blender
/

pair-ranker

@@ -27,6 +27,30 @@ which is trained on [mixinstruct](https://huggingface.co/datasets/llm-blender/mi
 - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
 - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
 ## Usage Example
 Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
 Otherwise, loading it directly with hugging face `from_pretrained()` API will encounter errors.

 - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
 - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
+|    **Methods**    | BERTScore | BARTScore |   BLEURT  | GPT-Rank |  Beat Vic(%)  |   Beat OA(%)  |  Top-1(%)  |  Top-2(%)  |  Top-3(%)  |
+|:-----------------:|:---------:|:---------:|:---------:|:--------:|:----------:|:----------:|:----------:|:----------:|:----------:|
+|   Open Assistant  | **74.68** |   -3.45   | **-0.39** | **3.90** |  **62.78** |     N/A    |    17.35   |    35.67   |    51.98   |
+|       Vicuna      |   69.60   | **-3.44** |   -0.61   |   4.13   |     N/A    |  **64.77** |  **25.47** |  **41.23** |  **52.88** |
+|       Alpaca      |   71.46   |   -3.57   |   -0.53   |   4.62   |    56.70   |    61.35   |    15.41   |    29.81   |    44.46   |
+|       Baize       |   65.57   |   -3.53   |   -0.66   |   4.86   |    52.76   |    56.40   |    14.23   |    26.91   |    38.80   |
+|        moss       |   64.85   |   -3.65   |   -0.73   |   5.09   |    51.62   |    51.79   |    15.93   |    27.52   |    38.27   |
+|      ChatGLM      |   70.38   |   -3.52   |   -0.62   |   5.63   |    44.04   |    45.67   |    9.41    |    19.37   |    28.78   |
+|       Koala       |   63.96   |   -3.85   |   -0.84   |   6.76   |    39.93   |    39.01   |    8.15    |    15.72   |    22.55   |
+|      Dolly v2     |   62.26   |   -3.83   |   -0.87   |   6.90   |    33.33   |    31.44   |    5.16    |    10.06   |    16.45   |
+|     Mosaic MPT    |   63.21   |   -3.72   |   -0.82   |   7.19   |    30.87   |    30.16   |    5.39    |    10.61   |    16.24   |
+|      StableLM     |   62.47   |   -4.12   |   -0.98   |   8.71   |    21.55   |    19.87   |    2.33    |    4.74    |    7.96    |
+|      Flan-T5      |   64.92   |   -4.57   |   -1.23   |   8.81   |    23.89   |    19.93   |    1.30    |    2.87    |    5.32    |
+| Oracle(BERTScore) | **77.67** |   -3.17   |   -0.27   |   3.88   |    54.41   |    38.84   |    20.16   |    38.11   |    53.49   |
+|   Oracle(BLEURT)  |   75.02   |   -3.15   | **-0.15** |   3.77   |    55.61   |    45.80   |    21.48   |    39.84   |    55.36   |
+| Oracle(BARTScore) |   73.23   | **-2.87** |   -0.38   |   3.69   |    50.32   |    57.01   |    26.10   |    43.70   |    57.33   |
+|  Oracle(ChatGPT)  |   70.32   |   -3.33   |   -0.51   | **1.00** | **100.00** | **100.00** | **100.00** | **100.00** | **100.00** |
+|     Random    |   66.36   |   -3.76   |   -0.77   |   6.14   |   37.75   |   36.91   |   11.28   |   20.69   |   29.05   |
+|  MLM-Scoring  |   64.77   |   -4.03   |   -0.88   |   7.00   |   33.87   |   30.39   |    7.29   |   14.09   |   21.46   |
+|     SimCLS    | **73.14** |   -3.22   |   -0.38   |   3.50   |   52.11   |   49.93   |   26.72   |   46.24   |   60.72   |
+| SummaReranker |   71.60   |   -3.25   |   -0.41   |   3.66   | **55.63** |   48.46   |   23.89   |   42.44   |   57.54   |
+|   [PairRanker](https://huggingface.co/llm-blender/pair-ranker) (This model)  |   72.97   | **-3.14** | **-0.37** | **3.20** |   54.76   | **57.79** | **30.08** | **50.68** | **65.12** |
 ## Usage Example
 Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
 Otherwise, loading it directly with hugging face `from_pretrained()` API will encounter errors.