Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,6 @@ An OpenEnv-compliant reinforcement learning environment where AI agents learn to
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
-
## π Quick Links
|
| 30 |
|
| 31 |
## π Quick Links
|
| 32 |
|
|
@@ -77,7 +76,7 @@ Trained **Qwen2.5-7B-Instruct** with **GRPO** using **Unsloth** (only 0.53% of p
|
|
| 77 |
|
| 78 |
### GRPO Training Curves β 200 Steps
|
| 79 |
|
| 80 |
-

|
| 95 |
|
| 96 |
-

|
| 80 |
|
| 81 |
| Metric | Value |
|
| 82 |
|---|---|
|
|
|
|
| 92 |
|
| 93 |
### Evaluation β Trained vs Random Agent (15 Scenarios)
|
| 94 |
|
| 95 |
+

|
| 96 |
|
| 97 |
| Agent | Avg Improvement | Best Scenario | Worst Scenario |
|
| 98 |
|---|---|---|---|
|