Support Ticket GRPO Agent

Fine-tuned Qwen/Qwen2.5-0.5B-Instruct using GRPO (Group Relative Policy Optimization) + LoRA on a multi-step support ticket environment.

Training Setup

  • Algorithm: GRPO via trl.GRPOTrainer + LoRA (PEFT)
  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Dataset: 1000 prompts over 50 support tickets
  • Environment: algocore-support-ticket-env
  • Group size G: 2
  • KL beta: 0.04
  • Final loss: 0.0008

Results

Task Before After Delta
Task 1 (Classify) 0.667 1.000 +0.333
Task 2 (Action) 0.117 0.450 +0.333
Task 3 (Full Resolve) 0.083 0.258 +0.175
Overall 0.289 0.569 +0.280

GRPO Training Results

Downloads last month
21
Safetensors
Model size
0.5B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlgoCore/support-ticket-grpo-model

Adapter
(595)
this model