kushal-tri commited on
Commit
166b53c
1 Parent(s): 8b78dfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -62,26 +62,23 @@ Here are the evaluation results for DCLM-1B models on various tasks (using [llm-
62
 
63
  Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
64
 
65
- Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
66
 
67
  | Model | AlpacaEval2.0 LC Win-rate (%) |
68
  |------------------------------------|------------------------------:|
69
  | **Our runs** | |
70
- | DCLM-IT-1B | 8.6 |
71
  | DCLM-IT-7B | 16.6 |
72
- | Mistral-7B w/ OpenHermes 2.5 | 15.4 |
73
- | DCLM-Baseline-7B w/ OpenHermes 2.5 | 13.8 |
74
- | **Reported from the leaderboard** | |
75
- | LLaMA-3-Instruct-8B | **22.9** |
76
- | Mistral-v0.2-7B | 17.1 |
77
- | Mistral-7B w/ OpenHermes 2.5 | 16.2 |
78
- | Zephyr-Beta-7B | 13.2 |
79
- | Vicuna-v1.3-13B | 10.8 |
80
  | Gemma-Instruct-7B | 10.4 |
81
  | Nous-Hermes-13B | 9.7 |
82
  | DaVinci001 | 9.0 |
83
  | LLaMA-2-Chat-13B | 8.4 |
84
  | Alpaca-7B | 5.9 |
 
 
 
 
85
 
86
  ## Example Code
87
 
 
62
 
63
  Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
64
 
65
+ Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
66
 
67
  | Model | AlpacaEval2.0 LC Win-rate (%) |
68
  |------------------------------------|------------------------------:|
69
  | **Our runs** | |
70
+ | DCLM-IT-1B | **8.6** |
71
  | DCLM-IT-7B | 16.6 |
72
+ | **Reported from the leaderboard** | |
 
 
 
 
 
 
 
73
  | Gemma-Instruct-7B | 10.4 |
74
  | Nous-Hermes-13B | 9.7 |
75
  | DaVinci001 | 9.0 |
76
  | LLaMA-2-Chat-13B | 8.4 |
77
  | Alpaca-7B | 5.9 |
78
+ | Gemma-Instruct-2B | 5.4 |
79
+ | Phi-2 SFT | 5.9 |
80
+ | Qwen1.5 1.8B Chat | 2.6 |
81
+ |--------------------------------------------------------------------|
82
 
83
  ## Example Code
84