Instructions to use Edward1239/DL_hw3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Edward1239/DL_hw3 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct") model = PeftModel.from_pretrained(base_model, "Edward1239/DL_hw3") - Notebooks
- Google Colab
- Kaggle
DL HW3 GRPO LoRA Adapter
This repository contains the final LoRA adapter for DL HW3: Reasoning LLM Step 3 with GRPO.
Base Model
Qwen/Qwen2.5-14B-Instruct
Method
The model was initialized from my Step2 SFT LoRA adapter and further optimized using GRPO.
Final Adapter
outputs/grpo_hw2best_balanced_30steps_lr2e8
Training Data
The final GRPO run used a balanced version of HW2_.csv:
- A: 261
- B: 261
- C: 261
- D: 261
Inference
Final inference uses score-only A/B/C/D log-softmax scoring.
Public Leaderboard
Public LB score: around 0.71
- Downloads last month
- 18