laterabhi commited on
Commit
4585c0d
·
verified ·
1 Parent(s): 9a6ccae

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +204 -68
index.html CHANGED
@@ -1,77 +1,213 @@
1
  <!DOCTYPE html>
2
- <html>
3
  <head>
4
- <meta charset="utf-8">
 
5
  <title>GRPO SQL Optimizer</title>
6
  <style>
7
- body { font-family: sans-serif; max-width: 900px; margin: 40px auto; padding: 0 20px; }
8
- img { max-width: 100%; }
9
- table { border-collapse: collapse; width: 100%; }
10
- th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
11
- th { background: #f4f4f4; }
12
- code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  </style>
14
  </head>
 
15
  <body>
16
- <h1>GRPO Training for SQL Query Optimization</h1>
17
-
18
- <h2>Overview</h2>
19
- <p>Fine-tuned <code>Qwen/Qwen2.5-0.5B-Instruct</code> using GRPO (Group Relative Policy Optimization)
20
- reinforcement learning to optimize SQL queries using a DuckDB execution environment.</p>
21
-
22
- <h2>Results</h2>
23
- <img src="grpo_results.png" alt="Training Curve"/>
24
-
25
- <h3>Training Progress</h3>
26
- <table>
27
- <tr><th>Metric</th><th>Value</th></tr>
28
- <tr><td>Start avg (ep1-10)</td><td>0.3090</td></tr>
29
- <tr><td>End avg (ep91-100)</td><td>0.5962</td></tr>
30
- <tr><td>Improvement</td><td>+93%</td></tr>
31
- </table>
32
-
33
- <h3>Final Evaluation</h3>
34
- <table>
35
- <tr><th>Task</th><th>Difficulty</th><th>Score</th></tr>
36
- <tr><td>task_1_basic_antipatterns</td><td>easy</td><td>0.7500 ✅</td></tr>
37
- <tr><td>task_2_correlated_subqueries</td><td>medium</td><td>0.8313 ✅</td></tr>
38
- <tr><td>task_3_wildcard_scan</td><td>medium-hard</td><td>0.9250 ✅</td></tr>
39
- <tr><td>task_4_implicit_join</td><td>hard</td><td>0.6438 ✅</td></tr>
40
- <tr><td>task_5_window_functions</td><td>expert</td><td>0.6250 ⚠️</td></tr>
41
- <tr><td><strong>Average</strong></td><td></td><td><strong>0.7550</strong></td></tr>
42
- </table>
43
- <p><strong>Baseline: 0.63 &nbsp;|&nbsp; Improvement: +0.1250 (+12.5%)</strong></p>
44
-
45
- <h2>Approach</h2>
46
- <h3>GRPO Training</h3>
47
- <ul>
48
- <li><strong>Algorithm:</strong> GRPO (Group Relative Policy Optimization)</li>
49
- <li><strong>Base Model:</strong> Qwen/Qwen2.5-0.5B-Instruct</li>
50
- <li><strong>Episodes:</strong> 100 × 4 completions per prompt</li>
51
- <li><strong>Hardware:</strong> Kaggle GPU T4 x2</li>
52
- </ul>
53
-
54
- <h3>Reward Function</h3>
55
- <ul>
56
- <li><code>execution_speedup</code>: How much faster the optimized query runs</li>
57
- <li><code>result_correctness</code>: Whether results are identical</li>
58
- <li><code>issue_detection</code>: Whether SQL anti-patterns were identified</li>
59
- <li><code>approval_correctness</code>: Whether approval flag is correct</li>
60
- <li><code>summary_quality</code>: Quality of the explanation</li>
61
- </ul>
62
-
63
- <h2>Key Findings</h2>
64
- <ol>
65
- <li><strong>Reward variance is critical</strong> Early runs had flat 0.08 rewards. Fixing the prompt to include schema information created reward variance needed for GRPO to learn.</li>
66
- <li><strong>Prompt engineering matters</strong> — Telling the model to use only columns from the schema was the single most impactful fix.</li>
67
- <li><strong>Partial credit helps</strong> — Adding issue detection bonus gave the model a learning signal even when SQL execution failed.</li>
68
- </ol>
69
-
70
- <h2>Links</h2>
71
- <ul>
72
- <li><a href="https://huggingface.co/laterabhi/grpo-sql-optimizer">Model on HuggingFace</a></li>
73
- <li><a href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-">SQL Environment</a></li>
74
- <li><a href="https://arxiv.org/abs/2402.03300">GRPO Paper</a></li>
75
- </ul>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  </body>
77
  </html>
 
1
  <!DOCTYPE html>
2
+ <html lang="en">
3
  <head>
4
+ <meta charset="utf-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
6
  <title>GRPO SQL Optimizer</title>
7
  <style>
8
+ :root{
9
+ --bg:#0d1117; --fg:#e6edf3; --muted:#8b949e; --acc:#58a6ff;
10
+ --card:#161b22; --bd:#30363d; --ok:#3fb950; --warn:#d29922;
11
+ }
12
+ *{box-sizing:border-box}
13
+ body{
14
+ margin:0;
15
+ font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Arial, sans-serif;
16
+ background:var(--bg);
17
+ color:var(--fg);
18
+ }
19
+ a{color:var(--acc); text-decoration:none}
20
+ a:hover{text-decoration:underline}
21
+ code{background:#1f2630; padding:2px 6px; border-radius:6px; border:1px solid var(--bd)}
22
+ .wrap{max-width:980px; margin:0 auto; padding:18px 18px 40px}
23
+
24
+ /* Top bar */
25
+ .hfTopbar{
26
+ position: sticky; top: 0; z-index: 9999;
27
+ display:flex; gap:10px; align-items:center; justify-content:space-between;
28
+ padding:10px 14px;
29
+ background: rgba(13,17,23,.92);
30
+ backdrop-filter: blur(8px);
31
+ border-bottom: 1px solid rgba(48,54,61,.9);
32
+ }
33
+ .hfTopbar .left{display:flex; gap:10px; align-items:center; flex-wrap:wrap}
34
+ .hfTopbar .brand{font-weight:800; color:var(--fg)}
35
+ .hfTopbar .pill{
36
+ color:var(--fg); text-decoration:none;
37
+ border:1px solid rgba(48,54,61,.9);
38
+ background:var(--card);
39
+ padding:6px 10px;
40
+ border-radius:999px;
41
+ font-size:12px;
42
+ display:inline-flex; align-items:center; gap:6px;
43
+ }
44
+ .hfTopbar .pill:hover{background:#1f2630; text-decoration:none}
45
+ .hfTopbar .muted{color:var(--muted); font-size:12px}
46
+
47
+ /* Content */
48
+ h1{font-size:28px; margin:18px 0 8px}
49
+ h2{font-size:18px; margin:22px 0 10px; color:var(--acc)}
50
+ p{color:var(--fg); line-height:1.6}
51
+ .sub{color:var(--muted); margin-top:0}
52
+
53
+ .grid{display:grid; grid-template-columns: 1fr 1fr; gap:14px}
54
+ @media (max-width: 860px){ .grid{grid-template-columns:1fr} }
55
+
56
+ .card{
57
+ background:var(--card);
58
+ border:1px solid var(--bd);
59
+ border-radius:12px;
60
+ padding:14px;
61
+ }
62
+ .card h3{margin:0 0 10px; font-size:14px; color:var(--muted); font-weight:700}
63
+
64
+ img{max-width:100%; border-radius:12px; border:1px solid var(--bd); background:#000}
65
+
66
+ table{border-collapse:collapse; width:100%; overflow:hidden; border-radius:12px; border:1px solid var(--bd)}
67
+ th,td{border-bottom:1px solid var(--bd); padding:10px; text-align:left; font-size:13px}
68
+ th{background:#0b1320; color:var(--muted); font-weight:700}
69
+ tr:last-child td{border-bottom:none}
70
+ .ok{color:var(--ok); font-weight:800}
71
+ .note{color:var(--muted); font-size:13px}
72
+
73
+ .kpiRow{display:flex; flex-wrap:wrap; gap:10px; margin-top:10px}
74
+ .kpi{
75
+ background:#0b1320;
76
+ border:1px solid var(--bd);
77
+ border-radius:12px;
78
+ padding:10px 12px;
79
+ min-width: 170px;
80
+ }
81
+ .kpi .v{font-weight:900; font-size:16px}
82
+ .kpi .l{color:var(--muted); font-size:12px}
83
+
84
+ footer{margin-top:26px; color:var(--muted); font-size:12px}
85
  </style>
86
  </head>
87
+
88
  <body>
89
+ <div class="hfTopbar">
90
+ <div class="left">
91
+ <span class="brand">🗄️ GRPO SQL Optimizer</span>
92
+ <a class="pill" href="https://huggingface.co/spaces/laterabhi/grpo-sql-optimizer/blob/main/README.md" target="_blank" rel="noopener">📝 Writeup</a>
93
+ <a class="pill" href="https://huggingface.co/spaces/laterabhi/grpo-sql-optimizer/blob/main/Blog.md" target="_blank" rel="noopener">📄 Blog.md</a>
94
+ <a class="pill" href="https://huggingface.co/laterabhi/grpo-sql-optimizer" target="_blank" rel="noopener">🤗 Model</a>
95
+ <a class="pill" href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-" target="_blank" rel="noopener">💻 GitHub</a>
96
+ </div>
97
+ <div class="muted">DuckDB-verifiable rewards · OpenEnv</div>
98
+ </div>
99
+
100
+ <div class="wrap">
101
+ <h1>GRPO Training for SQL Query Optimization</h1>
102
+ <p class="sub">
103
+ Fine-tuned <code>Qwen/Qwen2.5-0.5B-Instruct</code> using GRPO (Group Relative Policy Optimization)
104
+ to optimize SQL queries using a DuckDB execution environment.
105
+ </p>
106
+
107
+ <div class="grid">
108
+ <div class="card">
109
+ <h3>Overview</h3>
110
+ <p>
111
+ This project trains/evaluates SQL optimization with <strong>execution-grounded</strong> rewards:
112
+ the environment executes both original and rewritten SQL on real DuckDB data, and scores
113
+ speedup + correctness + structured diagnostics.
114
+ </p>
115
+ <div class="kpiRow">
116
+ <div class="kpi">
117
+ <div class="v">5</div>
118
+ <div class="l">tasks (easy → expert)</div>
119
+ </div>
120
+ <div class="kpi">
121
+ <div class="v">DuckDB</div>
122
+ <div class="l">verifiable execution</div>
123
+ </div>
124
+ <div class="kpi">
125
+ <div class="v">GRPO</div>
126
+ <div class="l">group-relative RL</div>
127
+ </div>
128
+ </div>
129
+ </div>
130
+
131
+ <div class="card">
132
+ <h3>Training curve</h3>
133
+
134
+ <!-- Prefer local image if you uploaded it to this Space -->
135
+ <img src="grpo_results.png" alt="GRPO training curve" />
136
+
137
+ <p class="note" style="margin-top:10px">
138
+ If this image ever breaks, the canonical plot is also in the GitHub repo:
139
+ <a href="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/grpo_reward_curve.png" target="_blank" rel="noopener">
140
+ results/grpo_reward_curve.png
141
+ </a>.
142
+ </p>
143
+ </div>
144
+ </div>
145
+
146
+ <h2>Training progress (100 episodes)</h2>
147
+ <table>
148
+ <tr><th>Metric</th><th>Value</th></tr>
149
+ <tr><td>Start avg (ep 1–10)</td><td>0.3090</td></tr>
150
+ <tr><td>End avg (ep 91–100)</td><td>0.5962</td></tr>
151
+ <tr><td><strong>Improvement</strong></td><td><strong>+93%</strong></td></tr>
152
+ </table>
153
+
154
+ <h2>Final evaluation (per task)</h2>
155
+ <p class="note">
156
+ These task scores are aligned to the GitHub repo README (source of truth). Task 5 is the expert scenario,
157
+ so it is expected to be the lowest — that is not an error.
158
+ </p>
159
+ <table>
160
+ <tr><th>Task</th><th>Difficulty</th><th>Score</th></tr>
161
+ <tr><td>task_1_basic_antipatterns</td><td>easy</td><td><span class="ok">0.7500 ✅</span></td></tr>
162
+ <tr><td>task_2_correlated_subqueries</td><td>medium</td><td><span class="ok">0.8313 ✅</span></td></tr>
163
+ <tr><td>task_3_wildcard_scan</td><td>medium-hard</td><td><span class="ok">0.6563 ✅</span></td></tr>
164
+ <tr><td>task_4_implicit_join</td><td>hard</td><td><span class="ok">0.6563 ✅</span></td></tr>
165
+ <tr><td>task_5_window_functions</td><td>expert</td><td><span class="ok">0.6500 ✅</span></td></tr>
166
+ </table>
167
+
168
+ <h2>Before / After (environment-only, reproducible)</h2>
169
+ <p class="note">
170
+ To avoid hand-wavy baselines, we provide a reproducible before/after contrast in the GitHub repo:
171
+ “before” = analysis-only (no optimized SQL), “after” = deterministic fallback with a real optimized query.
172
+ Chart:
173
+ <a href="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/before_after_chart.png" target="_blank" rel="noopener">
174
+ results/before_after_chart.png
175
+ </a>
176
+ </p>
177
+ <img
178
+ src="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/before_after_chart.png"
179
+ alt="Before/after chart"
180
+ />
181
+
182
+ <h2>Approach</h2>
183
+ <div class="card">
184
+ <h3>GRPO setup</h3>
185
+ <ul class="note">
186
+ <li><strong>Algorithm:</strong> GRPO (Group Relative Policy Optimization)</li>
187
+ <li><strong>Base model:</strong> Qwen/Qwen2.5-0.5B-Instruct</li>
188
+ <li><strong>Group size:</strong> 4 completions per prompt</li>
189
+ <li><strong>Hardware:</strong> Kaggle GPU T4 x2 (see repo links)</li>
190
+ </ul>
191
+
192
+ <h3 style="margin-top:12px">Reward components</h3>
193
+ <ul class="note">
194
+ <li><code>execution_speedup</code>: measured DuckDB timing ratio</li>
195
+ <li><code>result_correctness</code>: results match check (order-independent for large sets)</li>
196
+ <li><code>issue_detection</code>: anti-pattern detection vs ground truth keywords</li>
197
+ <li><code>approval_correctness</code>, <code>summary_quality</code>, <code>severity_labels</code></li>
198
+ </ul>
199
+ </div>
200
+
201
+ <h2>Links</h2>
202
+ <ul class="note">
203
+ <li><a href="https://huggingface.co/laterabhi/grpo-sql-optimizer" target="_blank" rel="noopener">Model on Hugging Face</a></li>
204
+ <li><a href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-" target="_blank" rel="noopener">GitHub repo (source of truth)</a></li>
205
+ <li><a href="https://arxiv.org/abs/2402.03300" target="_blank" rel="noopener">GRPO paper</a></li>
206
+ </ul>
207
+
208
+ <footer>
209
+ Tip: If you want all images to load instantly (no GitHub raw), upload the PNGs into this Space repo and change the <code>src</code> to local file paths.
210
+ </footer>
211
+ </div>
212
  </body>
213
  </html>