Text Classification
Transformers
Safetensors
mistral
feature-extraction
reward_model
custom_code
text-generation-inference
hanbin commited on
Commit
d352c2d
1 Parent(s): 4110700

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -53,6 +53,14 @@ test("openbmb/Eurus-RM-7b")
53
  # Output 2: 0.7317184507846832
54
  ```
55
 
 
 
 
 
 
 
 
 
56
  ## Citation
57
  ```
58
  @misc{yuan2024advancing,
 
53
  # Output 2: 0.7317184507846832
54
  ```
55
 
56
+ ## Evaluation
57
+ - Eurus-RM-7B stands out as the best 7B RM overall and achieves similar or better performance than much larger baselines. Particularly, it outperforms GPT-4 in certain tasks.
58
+ - Our training objective is beneficial in improving RM performance on hard problems and reasoning.
59
+ - ULTRAINTERACT is compatible with other datasets like UltraFeedback and UltraSafety, and mixing these datasets can balance different RM abilities.
60
+ - Eurus-RM-7B improves LLMs’ reasoning performance by a large margin through reranking.
61
+ <img src="./figures/rm_exp.png" alt="stats" style="zoom: 40%;" />
62
+
63
+
64
  ## Citation
65
  ```
66
  @misc{yuan2024advancing,