junaid0600 commited on
Commit
a1514f8
Β·
verified Β·
1 Parent(s): 32b93fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -26,7 +26,6 @@ An OpenEnv-compliant reinforcement learning environment where AI agents learn to
26
 
27
  ---
28
 
29
- ## πŸ”— Quick Links
30
 
31
  ## πŸ”— Quick Links
32
 
@@ -77,7 +76,7 @@ Trained **Qwen2.5-7B-Instruct** with **GRPO** using **Unsloth** (only 0.53% of p
77
 
78
  ### GRPO Training Curves β€” 200 Steps
79
 
80
- ![Demo](assets/loss_curve_demo.png)
81
 
82
  | Metric | Value |
83
  |---|---|
@@ -93,7 +92,7 @@ Trained **Qwen2.5-7B-Instruct** with **GRPO** using **Unsloth** (only 0.53% of p
93
 
94
  ### Evaluation β€” Trained vs Random Agent (15 Scenarios)
95
 
96
- ![Demo](assets/reward_curve_demo.png)
97
 
98
  | Agent | Avg Improvement | Best Scenario | Worst Scenario |
99
  |---|---|---|---|
 
26
 
27
  ---
28
 
 
29
 
30
  ## πŸ”— Quick Links
31
 
 
76
 
77
  ### GRPO Training Curves β€” 200 Steps
78
 
79
+ ![Demo](assests/loss_curve_demo.png)
80
 
81
  | Metric | Value |
82
  |---|---|
 
92
 
93
  ### Evaluation β€” Trained vs Random Agent (15 Scenarios)
94
 
95
+ ![Demo](assests/reward_curve_demo.png)
96
 
97
  | Agent | Avg Improvement | Best Scenario | Worst Scenario |
98
  |---|---|---|---|