SultanR commited on
Commit
cef7eef
1 Parent(s): 59b5dc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -126,7 +126,7 @@ This model scores the highest current score in both IFEval and GSM8k while maint
126
 
127
  Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
128
 
129
- # Evaluation
130
 
131
  I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
132
 
@@ -140,7 +140,7 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
140
  | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
141
  | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
142
 
143
- # Usage
144
 
145
  Just like any Huggingface model, just run it using the transformers library:
146
 
@@ -159,7 +159,7 @@ print(tokenizer.decode(outputs[0]))
159
 
160
  You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
161
 
162
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
163
 
164
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
165
 
@@ -177,8 +177,9 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
177
  |MuSR (0-shot) | 1.92|
178
  |MMLU-PRO (5-shot) | 7.89|
179
 
180
- # Citation
181
 
 
182
  @misc{alrashed2024smoltuluhigherlearningrate,
183
  title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
184
  author={Sultan Alrashed},
@@ -187,4 +188,5 @@ As of writing this, the number 1 ranking model in IFEval for any model under 2 b
187
  archivePrefix={arXiv},
188
  primaryClass={cs.CL},
189
  url={https://arxiv.org/abs/2412.08347},
190
- }
 
 
126
 
127
  Something important to note, this model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run properly.
128
 
129
+ ## Evaluation
130
 
131
  I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
132
 
 
140
  | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
141
  | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
142
 
143
+ ## Usage
144
 
145
  Just like any Huggingface model, just run it using the transformers library:
146
 
 
159
 
160
  You can also use the model in llama.cpp through the [gguf version](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct-GGUF)!
161
 
162
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
163
 
164
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct)
165
 
 
177
  |MuSR (0-shot) | 1.92|
178
  |MMLU-PRO (5-shot) | 7.89|
179
 
180
+ ## Citation
181
 
182
+ ```
183
  @misc{alrashed2024smoltuluhigherlearningrate,
184
  title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
185
  author={Sultan Alrashed},
 
188
  archivePrefix={arXiv},
189
  primaryClass={cs.CL},
190
  url={https://arxiv.org/abs/2412.08347},
191
+ }
192
+ ```