benjaminzwhite
/

Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit

@@ -30,7 +30,7 @@ This is an early experiment using the `GRPOTrainer` and training reasoning model
 To use this with standard HuggingFace code, I recommend starting with this code (based 95% on the default code shown at the base model page : [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct))
-```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit"

 To use this with standard HuggingFace code, I recommend starting with this code (based 95% on the default code shown at the base model page : [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct))
+```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit"