DL HW3 GRPO LoRA Adapter

This repository contains the final LoRA adapter for DL HW3: Reasoning LLM Step 3 with GRPO.

Base Model

Qwen/Qwen2.5-14B-Instruct

Method

The model was initialized from my Step2 SFT LoRA adapter and further optimized using GRPO.

Final Adapter

outputs/grpo_hw2best_balanced_30steps_lr2e8

Training Data

The final GRPO run used a balanced version of HW2_.csv:

  • A: 261
  • B: 261
  • C: 261
  • D: 261

Inference

Final inference uses score-only A/B/C/D log-softmax scoring.

Public Leaderboard

Public LB score: around 0.71

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Edward1239/DL_hw3

Base model

Qwen/Qwen2.5-14B
Adapter
(359)
this model