Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +21 -21
- demo.py +1 -1
- inference.py +2 -3
- judge.py +49 -17
- models.py +1 -1
- openenv.yaml +7 -3
- playbook.py +10 -9
- pyproject.toml +4 -0
- server/app.py +1 -6
- server/requirements.txt +1 -2
- tasks.py +49 -57
README.md
CHANGED
|
@@ -143,30 +143,30 @@ A semantically correct but O(NΒ²) query re-executes `AVG(salary)` for every empl
|
|
| 143 |
**Schema:** `departments`, `employees` β 9 employees across 3 departments
|
| 144 |
**Goal:** Employees who earn strictly above their department average, ordered by dept/salary
|
| 145 |
|
| 146 |
-
### Expert β Fix the Tie-Breaking Window Function
|
| 147 |
-
`ROW_NUMBER()`
|
| 148 |
|
| 149 |
**Schema:** `sales_reps(id, name, region, revenue)` β 6 reps across 2 regions with ties
|
| 150 |
**Goal:** All reps whose revenue is the highest in their region
|
| 151 |
|
| 152 |
-
### Expert β Traverse Org Chart with Recursive CTE
|
| 153 |
-
|
| 154 |
|
| 155 |
**Schema:** `employees(id, name, manager_id)` β 14 employees, 4 levels deep
|
| 156 |
-
**Goal:** All 8 subordinates of VP Eng at any depth, ordered by id
|
| 157 |
|
| 158 |
-
### Expert β Fix
|
| 159 |
-
|
| 160 |
|
| 161 |
-
**Schema:** `quarterly_sales(region, quarter, revenue)` β 8 rows across 2 regions
|
| 162 |
-
**Goal:** Per-region running total (`ORDER BY quarter`) and within-region revenue rank (`ORDER BY revenue DESC`)
|
| 163 |
|
| 164 |
> **Structural penalties** are enforced per task level/id to prevent gaming:
|
| 165 |
> - `hard`: requires `WITH` clause (β0.30 if absent)
|
| 166 |
> - `medium`: requires explicit `JOIN` (β0.20 if absent)
|
| 167 |
-
> - `task_expert_recursive`: requires `WITH RECURSIVE` (β0.30
|
| 168 |
-
> - `task_expert_rank`: penalises `ROW_NUMBER()` (β0.20
|
| 169 |
-
> - `task_expert_window`: requires `PARTITION BY` in both window functions (β0.20 if absent)
|
| 170 |
|
| 171 |
---
|
| 172 |
|
|
@@ -249,19 +249,19 @@ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
|
|
| 249 |
|
| 250 |
## Baseline Results
|
| 251 |
|
| 252 |
-
The following scores were produced by running `
|
| 253 |
|
| 254 |
| Task | Level | Steps Used | Best Score |
|
| 255 |
|---|---|---|---|
|
| 256 |
| Fix the Syntax Errors | easy | 1 | **1.000** |
|
| 257 |
| Fix the Cartesian JOIN | medium | 1 | **0.900** |
|
| 258 |
-
| Rewrite Correlated Subquery as CTE | hard | 1 | **0.
|
| 259 |
-
|
|
|
|
|
|
|
|
|
|
|
| 260 |
|
| 261 |
-
|
| 262 |
-
- The reward pipeline returns meaningful signal immediately
|
| 263 |
-
- The environment terminates cleanly when the done threshold (β₯ 0.90) is met
|
| 264 |
-
- A stronger model or a harder task set would produce more training-relevant trajectories
|
| 265 |
|
| 266 |
---
|
| 267 |
|
|
@@ -348,13 +348,13 @@ queryforge/
|
|
| 348 |
βββ playbook.py # Local test runner (no server required)
|
| 349 |
ββοΏ½οΏ½οΏ½ inference.py # Baseline inference script (any OpenAI-compatible LLM)
|
| 350 |
βββ demo.py # Gradio interactive demo (mounted at /demo)
|
|
|
|
| 351 |
βββ openenv.yaml # OpenEnv manifest
|
| 352 |
βββ pyproject.toml # Project metadata and dependencies
|
| 353 |
βββ uv.lock # Locked dependencies
|
| 354 |
βββ server/
|
| 355 |
βββ app.py # FastAPI app β core + /tasks REST endpoints + Gradio mount
|
| 356 |
βββ queryforge_environment.py # Environment class (reset, step, state)
|
| 357 |
-
βββ Dockerfile # Container image
|
| 358 |
βββ requirements.txt # Server dependencies
|
| 359 |
```
|
| 360 |
|
|
@@ -373,7 +373,7 @@ Add `ANTHROPIC_API_KEY` as a Space secret after deployment. Without it, the envi
|
|
| 373 |
### Docker
|
| 374 |
|
| 375 |
```bash
|
| 376 |
-
docker build -t queryforge:latest
|
| 377 |
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY queryforge:latest
|
| 378 |
```
|
| 379 |
|
|
|
|
| 143 |
**Schema:** `departments`, `employees` β 9 employees across 3 departments
|
| 144 |
**Goal:** Employees who earn strictly above their department average, ordered by dept/salary
|
| 145 |
|
| 146 |
+
### Expert β Fix the Tie-Breaking Window Function (2 bugs)
|
| 147 |
+
Two layered bugs: `ROW_NUMBER()` drops tied reps AND `ORDER BY revenue ASC` picks the lowest earners instead of the highest. Agent must fix the sort order AND switch to `RANK()`/`DENSE_RANK()` β fixing only one still produces wrong results.
|
| 148 |
|
| 149 |
**Schema:** `sales_reps(id, name, region, revenue)` β 6 reps across 2 regions with ties
|
| 150 |
**Goal:** All reps whose revenue is the highest in their region
|
| 151 |
|
| 152 |
+
### Expert β Traverse Org Chart with Recursive CTE (2 bugs)
|
| 153 |
+
Two layered bugs: the anchor uses `WHERE id = 3` (includes VP Eng himself in results) AND the query is a hardcoded two-level CTE that misses deeper employees. Agent must fix the anchor to `WHERE manager_id = 3` AND convert to `WITH RECURSIVE`.
|
| 154 |
|
| 155 |
**Schema:** `employees(id, name, manager_id)` β 14 employees, 4 levels deep
|
| 156 |
+
**Goal:** All 8 subordinates of VP Eng at any depth (excluding VP Eng), ordered by id
|
| 157 |
|
| 158 |
+
### Expert β Fix Broken Window Functions (3 bugs)
|
| 159 |
+
Three layered bugs: both `SUM` and `RANK` window functions are missing `PARTITION BY`, they need different `ORDER BY` clauses, AND the data contains tied revenue values (West Q3=Q4=16000) that must be ranked correctly.
|
| 160 |
|
| 161 |
+
**Schema:** `quarterly_sales(region, quarter, revenue)` β 8 rows across 2 regions with ties
|
| 162 |
+
**Goal:** Per-region running total (`ORDER BY quarter`) and within-region revenue rank (`ORDER BY revenue DESC`) with correct tie handling
|
| 163 |
|
| 164 |
> **Structural penalties** are enforced per task level/id to prevent gaming:
|
| 165 |
> - `hard`: requires `WITH` clause (β0.30 if absent)
|
| 166 |
> - `medium`: requires explicit `JOIN` (β0.20 if absent)
|
| 167 |
+
> - `task_expert_recursive`: requires `WITH RECURSIVE` (β0.30) + correct anchor via `manager_id` (β0.15)
|
| 168 |
+
> - `task_expert_rank`: penalises `ROW_NUMBER()` (β0.20) + penalises `ASC` ordering without `DESC` (β0.15)
|
| 169 |
+
> - `task_expert_window`: requires `PARTITION BY` in both window functions (β0.20 if absent, β0.10 if only one)
|
| 170 |
|
| 171 |
---
|
| 172 |
|
|
|
|
| 249 |
|
| 250 |
## Baseline Results
|
| 251 |
|
| 252 |
+
The following scores were produced by running `meta-llama/Llama-3.1-8B-Instruct` (via HuggingFace router) as the agent against all 6 tasks with the full AI judge active.
|
| 253 |
|
| 254 |
| Task | Level | Steps Used | Best Score |
|
| 255 |
|---|---|---|---|
|
| 256 |
| Fix the Syntax Errors | easy | 1 | **1.000** |
|
| 257 |
| Fix the Cartesian JOIN | medium | 1 | **0.900** |
|
| 258 |
+
| Rewrite Correlated Subquery as CTE | hard | 1 | **0.900** |
|
| 259 |
+
| Fix the Tie-Breaking Window Function | expert | 1 | **1.000** |
|
| 260 |
+
| Traverse Org Chart with Recursive CTE | expert | 2 | **0.900** |
|
| 261 |
+
| Fix Two Broken Window Functions | expert | 3 | **0.900** |
|
| 262 |
+
| **Average** | | | **0.933** |
|
| 263 |
|
| 264 |
+
The easyβhard tasks and the rank/recursive expert tasks were solved in 1β2 steps. The dual-window expert task required 3 steps, demonstrating the feedback loop produces training-relevant multi-step trajectories for harder tasks.
|
|
|
|
|
|
|
|
|
|
| 265 |
|
| 266 |
---
|
| 267 |
|
|
|
|
| 348 |
βββ playbook.py # Local test runner (no server required)
|
| 349 |
ββοΏ½οΏ½οΏ½ inference.py # Baseline inference script (any OpenAI-compatible LLM)
|
| 350 |
βββ demo.py # Gradio interactive demo (mounted at /demo)
|
| 351 |
+
βββ Dockerfile # Container image
|
| 352 |
βββ openenv.yaml # OpenEnv manifest
|
| 353 |
βββ pyproject.toml # Project metadata and dependencies
|
| 354 |
βββ uv.lock # Locked dependencies
|
| 355 |
βββ server/
|
| 356 |
βββ app.py # FastAPI app β core + /tasks REST endpoints + Gradio mount
|
| 357 |
βββ queryforge_environment.py # Environment class (reset, step, state)
|
|
|
|
| 358 |
βββ requirements.txt # Server dependencies
|
| 359 |
```
|
| 360 |
|
|
|
|
| 373 |
### Docker
|
| 374 |
|
| 375 |
```bash
|
| 376 |
+
docker build -t queryforge:latest .
|
| 377 |
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY queryforge:latest
|
| 378 |
```
|
| 379 |
|
demo.py
CHANGED
|
@@ -115,7 +115,7 @@ Fix broken or slow SQL queries and get instant graded feedback.
|
|
| 115 |
)
|
| 116 |
)
|
| 117 |
|
| 118 |
-
with gr.Blocks(title="QueryForge"
|
| 119 |
|
| 120 |
state = gr.State(None)
|
| 121 |
|
|
|
|
| 115 |
)
|
| 116 |
)
|
| 117 |
|
| 118 |
+
with gr.Blocks(title="QueryForge") as demo:
|
| 119 |
|
| 120 |
state = gr.State(None)
|
| 121 |
|
inference.py
CHANGED
|
@@ -188,9 +188,8 @@ def run_task(task_id: str, llm: OpenAI, env_client) -> dict:
|
|
| 188 |
|
| 189 |
def main() -> None:
|
| 190 |
# ββ Validate required config ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
print(f"ERROR: missing required env vars: {', '.join(missing)}")
|
| 194 |
sys.exit(1)
|
| 195 |
|
| 196 |
if not API_KEY:
|
|
|
|
| 188 |
|
| 189 |
def main() -> None:
|
| 190 |
# ββ Validate required config ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 191 |
+
if not MODEL_NAME:
|
| 192 |
+
print("ERROR: MODEL_NAME env var is not set.")
|
|
|
|
| 193 |
sys.exit(1)
|
| 194 |
|
| 195 |
if not API_KEY:
|
judge.py
CHANGED
|
@@ -16,7 +16,7 @@ Grading pipeline for each submitted SQL query:
|
|
| 16 |
Partial credit for correct row count or partial row matches.
|
| 17 |
|
| 18 |
Stage 4 β AI Quality (β 1.0)
|
| 19 |
-
Anthropic claude-
|
| 20 |
semantic correctness vs. the reference solution.
|
| 21 |
The AI score can move the final score up to 1.0 when rows are correct,
|
| 22 |
or provide nuanced feedback even when rows are partially wrong.
|
|
@@ -183,16 +183,24 @@ def rows_match(
|
|
| 183 |
|
| 184 |
projected = [_project(row) for row in actual]
|
| 185 |
|
|
|
|
|
|
|
|
|
|
| 186 |
if len(projected) != len(expected):
|
| 187 |
-
|
| 188 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
return score, (
|
| 190 |
f"Row count mismatch: got {len(projected)}, expected {len(expected)}. "
|
| 191 |
-
f"
|
| 192 |
)
|
| 193 |
|
| 194 |
-
actual_sorted = sorted(
|
| 195 |
-
expected_sorted = sorted(
|
| 196 |
|
| 197 |
matches = sum(1 for a, e in zip(actual_sorted, expected_sorted) if a == e)
|
| 198 |
row_accuracy = matches / len(expected)
|
|
@@ -289,7 +297,6 @@ Respond with ONLY valid JSON (no markdown fences):
|
|
| 289 |
{"role": "assistant", "content": "{"}, # prefill forces JSON-only reply
|
| 290 |
],
|
| 291 |
)
|
| 292 |
-
print("Anthropic judge response:", message.content)
|
| 293 |
# Prepend the prefilled "{" back before parsing
|
| 294 |
raw = "{" + message.content[0].text.strip()
|
| 295 |
|
|
@@ -381,15 +388,35 @@ def grade(
|
|
| 381 |
elif task.level == "medium" and "JOIN " not in query_upper:
|
| 382 |
structural_penalty = 0.20 # medium task demands explicit JOINs
|
| 383 |
row_feedback += " (Penalty: no explicit JOIN β task requires JOIN β¦ ON syntax.)"
|
| 384 |
-
elif task.id == "task_expert_recursive"
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
|
| 389 |
-
|
| 390 |
-
|
| 391 |
-
|
| 392 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 393 |
|
| 394 |
details["structural_penalty"] = structural_penalty
|
| 395 |
|
|
@@ -405,9 +432,14 @@ def grade(
|
|
| 405 |
details["ai_hint"] = ai_hint
|
| 406 |
|
| 407 |
# Final blending:
|
|
|
|
| 408 |
# rows fully correct β trust AI score (can reach 1.0)
|
| 409 |
# rows partially wrong β clamp AI score to not exceed deterministic
|
| 410 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 411 |
final_score = ai_score
|
| 412 |
elif row_score >= 0.5:
|
| 413 |
# Blend: AI provides nuance but can't exceed deterministic ceiling
|
|
|
|
| 16 |
Partial credit for correct row count or partial row matches.
|
| 17 |
|
| 18 |
Stage 4 β AI Quality (β 1.0)
|
| 19 |
+
Anthropic claude-haiku-4-5 evaluates optimization, code style, and
|
| 20 |
semantic correctness vs. the reference solution.
|
| 21 |
The AI score can move the final score up to 1.0 when rows are correct,
|
| 22 |
or provide nuanced feedback even when rows are partially wrong.
|
|
|
|
| 183 |
|
| 184 |
projected = [_project(row) for row in actual]
|
| 185 |
|
| 186 |
+
actual_norm = [_normalize(r) for r in projected]
|
| 187 |
+
expected_norm = [_normalize(r) for r in expected]
|
| 188 |
+
|
| 189 |
if len(projected) != len(expected):
|
| 190 |
+
# Count how many returned rows are actually in the expected set
|
| 191 |
+
expected_set = [tuple(sorted(r.items())) for r in expected_norm]
|
| 192 |
+
correct_rows = sum(1 for r in actual_norm if tuple(sorted(r.items())) in expected_set)
|
| 193 |
+
# Score based on fraction of expected rows correctly returned
|
| 194 |
+
coverage = correct_rows / len(expected)
|
| 195 |
+
# Base 0.10 for count mismatch, up to 0.45 for high coverage of correct rows
|
| 196 |
+
score = 0.10 + 0.35 * coverage
|
| 197 |
return score, (
|
| 198 |
f"Row count mismatch: got {len(projected)}, expected {len(expected)}. "
|
| 199 |
+
f"{correct_rows}/{len(expected)} expected rows present."
|
| 200 |
)
|
| 201 |
|
| 202 |
+
actual_sorted = sorted(actual_norm, key=lambda r: _sort_key(r, order_by))
|
| 203 |
+
expected_sorted = sorted(expected_norm, key=lambda r: _sort_key(r, order_by))
|
| 204 |
|
| 205 |
matches = sum(1 for a, e in zip(actual_sorted, expected_sorted) if a == e)
|
| 206 |
row_accuracy = matches / len(expected)
|
|
|
|
| 297 |
{"role": "assistant", "content": "{"}, # prefill forces JSON-only reply
|
| 298 |
],
|
| 299 |
)
|
|
|
|
| 300 |
# Prepend the prefilled "{" back before parsing
|
| 301 |
raw = "{" + message.content[0].text.strip()
|
| 302 |
|
|
|
|
| 388 |
elif task.level == "medium" and "JOIN " not in query_upper:
|
| 389 |
structural_penalty = 0.20 # medium task demands explicit JOINs
|
| 390 |
row_feedback += " (Penalty: no explicit JOIN β task requires JOIN β¦ ON syntax.)"
|
| 391 |
+
elif task.id == "task_expert_recursive":
|
| 392 |
+
# Two bugs: anchor uses WHERE id=3 (includes VP Eng) + non-recursive CTE (misses deep levels)
|
| 393 |
+
if "RECURSIVE" not in query_upper:
|
| 394 |
+
structural_penalty += 0.30
|
| 395 |
+
row_feedback += " (Penalty: WITH RECURSIVE required β hardcoded levels won't scale.)"
|
| 396 |
+
if "MANAGER_ID = 3" not in query_upper and "MANAGER_ID=3" not in query_upper:
|
| 397 |
+
structural_penalty += 0.15
|
| 398 |
+
row_feedback += " (Penalty: anchor should select subordinates via manager_id, not the VP themselves.)"
|
| 399 |
+
structural_penalty = min(structural_penalty, 0.40)
|
| 400 |
+
elif task.id == "task_expert_rank":
|
| 401 |
+
# Two bugs: ROW_NUMBER (drops ties) + ASC ordering (picks lowest instead of highest)
|
| 402 |
+
if "ROW_NUMBER" in query_upper:
|
| 403 |
+
structural_penalty += 0.20
|
| 404 |
+
row_feedback += " (Penalty: ROW_NUMBER() drops tied rows β use RANK() or DENSE_RANK().)"
|
| 405 |
+
if "ASC" in query_upper and "DESC" not in query_upper:
|
| 406 |
+
structural_penalty += 0.15
|
| 407 |
+
row_feedback += " (Penalty: ordering by revenue ASC picks lowest earners, not highest.)"
|
| 408 |
+
structural_penalty = min(structural_penalty, 0.35)
|
| 409 |
+
elif task.id == "task_expert_window":
|
| 410 |
+
# Three bugs: missing PARTITION BY on both windows + tied revenues need correct ranking
|
| 411 |
+
if "PARTITION BY" not in query_upper:
|
| 412 |
+
structural_penalty += 0.20
|
| 413 |
+
row_feedback += " (Penalty: missing PARTITION BY β both SUM and RANK must be partitioned per region.)"
|
| 414 |
+
# Count PARTITION BY occurrences β need at least 2 (one per window function)
|
| 415 |
+
partition_count = query_upper.count("PARTITION BY")
|
| 416 |
+
if 0 < partition_count < 2:
|
| 417 |
+
structural_penalty += 0.10
|
| 418 |
+
row_feedback += " (Penalty: only one window function has PARTITION BY β both need it.)"
|
| 419 |
+
structural_penalty = min(structural_penalty, 0.30)
|
| 420 |
|
| 421 |
details["structural_penalty"] = structural_penalty
|
| 422 |
|
|
|
|
| 432 |
details["ai_hint"] = ai_hint
|
| 433 |
|
| 434 |
# Final blending:
|
| 435 |
+
# AI judge offline (fallback) β use deterministic score directly
|
| 436 |
# rows fully correct β trust AI score (can reach 1.0)
|
| 437 |
# rows partially wrong β clamp AI score to not exceed deterministic
|
| 438 |
+
ai_is_fallback = abs(ai_score - deterministic_score) < 0.001
|
| 439 |
+
if ai_is_fallback:
|
| 440 |
+
# AI judge was unavailable β use deterministic score as-is
|
| 441 |
+
final_score = deterministic_score
|
| 442 |
+
elif row_score >= 0.95:
|
| 443 |
final_score = ai_score
|
| 444 |
elif row_score >= 0.5:
|
| 445 |
# Blend: AI provides nuance but can't exceed deterministic ceiling
|
models.py
CHANGED
|
@@ -24,7 +24,7 @@ class SQLObservation(Observation):
|
|
| 24 |
# ββ Task context βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 25 |
task_id: str = Field(default="", description="Active task identifier")
|
| 26 |
task_level: str = Field(
|
| 27 |
-
default="", description="Difficulty: easy | medium | hard"
|
| 28 |
)
|
| 29 |
task_title: str = Field(default="", description="Human-readable task title")
|
| 30 |
task_description: str = Field(
|
|
|
|
| 24 |
# ββ Task context βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 25 |
task_id: str = Field(default="", description="Active task identifier")
|
| 26 |
task_level: str = Field(
|
| 27 |
+
default="", description="Difficulty: easy | medium | hard | expert"
|
| 28 |
)
|
| 29 |
task_title: str = Field(default="", description="Human-readable task title")
|
| 30 |
task_description: str = Field(
|
openenv.yaml
CHANGED
|
@@ -11,16 +11,20 @@ description: |
|
|
| 11 |
An agent receives a broken or slow SQL query together with the schema and an
|
| 12 |
error/performance warning. It must produce a working, optimised query.
|
| 13 |
|
| 14 |
-
Tasks (
|
| 15 |
easy β fix three misspelled SQL keywords (SELECT / FROM / WHERE)
|
| 16 |
medium β fix a missing JOIN condition that causes a cartesian product
|
| 17 |
hard β rewrite a correlated subquery (O(NΒ²)) as a CTE (O(N))
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
Reward signal (0.0 β 1.0):
|
| 20 |
0.00 syntax error
|
| 21 |
0.15 syntax valid, runtime error
|
| 22 |
0.30 executes, wrong / empty results
|
| 23 |
0.30β0.80 partial row correctness (deterministic, DuckDB)
|
| 24 |
-
0.80β1.00 correct results + AI quality score (Anthropic claude-
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
| 11 |
An agent receives a broken or slow SQL query together with the schema and an
|
| 12 |
error/performance warning. It must produce a working, optimised query.
|
| 13 |
|
| 14 |
+
Tasks (6 tasks across 4 difficulty levels):
|
| 15 |
easy β fix three misspelled SQL keywords (SELECT / FROM / WHERE)
|
| 16 |
medium β fix a missing JOIN condition that causes a cartesian product
|
| 17 |
hard β rewrite a correlated subquery (O(NΒ²)) as a CTE (O(N))
|
| 18 |
+
expert β fix tie-breaking window function (2 bugs: ROW_NUMBER + ASC ordering)
|
| 19 |
+
expert β traverse org chart with recursive CTE (2 bugs: wrong anchor + hardcoded levels)
|
| 20 |
+
expert β fix broken window functions (3 bugs: missing PARTITION BY + tied revenues)
|
| 21 |
|
| 22 |
Reward signal (0.0 β 1.0):
|
| 23 |
0.00 syntax error
|
| 24 |
0.15 syntax valid, runtime error
|
| 25 |
0.30 executes, wrong / empty results
|
| 26 |
0.30β0.80 partial row correctness (deterministic, DuckDB)
|
| 27 |
+
0.80β1.00 correct results + AI quality score (Anthropic claude-haiku-4-5)
|
| 28 |
|
| 29 |
+
Optional env var: ANTHROPIC_API_KEY (enables AI judge for scores up to 1.0;
|
| 30 |
+
without it, scoring is fully deterministic and capped at 0.80)
|
playbook.py
CHANGED
|
@@ -3,13 +3,14 @@ QueryForge Client Playbook
|
|
| 3 |
ββββββββββββββββββββββββββ
|
| 4 |
Tests the environment through the HTTP server using the QueryforgeEnv client.
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Then run:
|
| 10 |
python playbook.py
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
| 13 |
If not set, the judge falls back to deterministic scoring (capped at 0.80).
|
| 14 |
"""
|
| 15 |
|
|
@@ -23,7 +24,7 @@ from client import QueryforgeEnv
|
|
| 23 |
from models import SQLAction, TaskSpec
|
| 24 |
from tasks import REGISTRY, task_from_dict
|
| 25 |
|
| 26 |
-
BASE_URL = "https://prithvigg-queryforge.hf.space"
|
| 27 |
|
| 28 |
# ββ Formatting helpers ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 29 |
|
|
@@ -239,10 +240,10 @@ if __name__ == "__main__":
|
|
| 239 |
_hr("β")
|
| 240 |
|
| 241 |
with QueryforgeEnv(base_url=BASE_URL).sync() as client:
|
| 242 |
-
|
| 243 |
run_medium(client)
|
| 244 |
run_hard(client)
|
| 245 |
-
|
| 246 |
|
| 247 |
_section("DONE")
|
| 248 |
-
print(" All
|
|
|
|
| 3 |
ββββββββββββββββββββββββββ
|
| 4 |
Tests the environment through the HTTP server using the QueryforgeEnv client.
|
| 5 |
|
| 6 |
+
Usage:
|
| 7 |
+
# Against the live HF Space:
|
|
|
|
|
|
|
| 8 |
python playbook.py
|
| 9 |
|
| 10 |
+
# Against a local server:
|
| 11 |
+
ENV_URL=http://localhost:8000 python playbook.py
|
| 12 |
+
|
| 13 |
+
If ANTHROPIC_API_KEY is set on the server, Stage 4 AI scoring is live.
|
| 14 |
If not set, the judge falls back to deterministic scoring (capped at 0.80).
|
| 15 |
"""
|
| 16 |
|
|
|
|
| 24 |
from models import SQLAction, TaskSpec
|
| 25 |
from tasks import REGISTRY, task_from_dict
|
| 26 |
|
| 27 |
+
BASE_URL = os.environ.get("ENV_URL", "https://prithvigg-queryforge.hf.space")
|
| 28 |
|
| 29 |
# ββ Formatting helpers ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 30 |
|
|
|
|
| 240 |
_hr("β")
|
| 241 |
|
| 242 |
with QueryforgeEnv(base_url=BASE_URL).sync() as client:
|
| 243 |
+
run_easy(client)
|
| 244 |
run_medium(client)
|
| 245 |
run_hard(client)
|
| 246 |
+
run_custom(client)
|
| 247 |
|
| 248 |
_section("DONE")
|
| 249 |
+
print(" All tasks completed.\n")
|
pyproject.toml
CHANGED
|
@@ -22,6 +22,10 @@ dependencies = [
|
|
| 22 |
"duckdb>=0.10.0",
|
| 23 |
# AI judge β quality scoring via Anthropic API
|
| 24 |
"anthropic>=0.25.0",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
]
|
| 26 |
|
| 27 |
[project.optional-dependencies]
|
|
|
|
| 22 |
"duckdb>=0.10.0",
|
| 23 |
# AI judge β quality scoring via Anthropic API
|
| 24 |
"anthropic>=0.25.0",
|
| 25 |
+
# Interactive demo UI (mounted at /demo on the FastAPI server)
|
| 26 |
+
"gradio>=4.0.0",
|
| 27 |
+
# Inference script uses the OpenAI client
|
| 28 |
+
"openai>=1.0.0",
|
| 29 |
]
|
| 30 |
|
| 31 |
[project.optional-dependencies]
|
server/app.py
CHANGED
|
@@ -124,9 +124,4 @@ def main(host: str = "0.0.0.0", port: int = 8000):
|
|
| 124 |
|
| 125 |
|
| 126 |
if __name__ == "__main__":
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
parser = argparse.ArgumentParser()
|
| 130 |
-
parser.add_argument("--port", type=int, default=8000)
|
| 131 |
-
args = parser.parse_args()
|
| 132 |
-
main(port=args.port)
|
|
|
|
| 124 |
|
| 125 |
|
| 126 |
if __name__ == "__main__":
|
| 127 |
+
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
server/requirements.txt
CHANGED
|
@@ -3,6 +3,5 @@ fastapi>=0.115.0
|
|
| 3 |
uvicorn>=0.24.0
|
| 4 |
duckdb>=0.10.0
|
| 5 |
anthropic>=0.25.0
|
| 6 |
-
|
| 7 |
-
|
| 8 |
|
|
|
|
| 3 |
uvicorn>=0.24.0
|
| 4 |
duckdb>=0.10.0
|
| 5 |
anthropic>=0.25.0
|
| 6 |
+
gradio>=4.0.0
|
|
|
|
| 7 |
|
tasks.py
CHANGED
|
@@ -270,9 +270,8 @@ _TASK_EXPERT_RANK = SQLTask(
|
|
| 270 |
level="expert",
|
| 271 |
title="Fix the Tie-Breaking Window Function",
|
| 272 |
description="""\
|
| 273 |
-
TASK: The query below
|
| 274 |
-
|
| 275 |
-
tied at rank 1 are returned.
|
| 276 |
|
| 277 |
SCHEMA:
|
| 278 |
sales_reps(id INTEGER, name VARCHAR, region VARCHAR, revenue DECIMAL)
|
|
@@ -281,19 +280,18 @@ BROKEN QUERY:
|
|
| 281 |
SELECT name, region, revenue
|
| 282 |
FROM (
|
| 283 |
SELECT name, region, revenue,
|
| 284 |
-
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue
|
| 285 |
FROM sales_reps
|
| 286 |
) ranked
|
| 287 |
WHERE rn = 1
|
| 288 |
ORDER BY region, name
|
| 289 |
|
| 290 |
PROBLEM:
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
|
| 295 |
GOAL: Return ALL reps whose revenue is the highest in their region.
|
| 296 |
-
Use RANK() or DENSE_RANK() instead of ROW_NUMBER().
|
| 297 |
Order by region ASC, name ASC.""",
|
| 298 |
schema_ddl="""\
|
| 299 |
CREATE TABLE sales_reps (id INTEGER, name VARCHAR, region VARCHAR, revenue DECIMAL);
|
|
@@ -309,16 +307,16 @@ INSERT INTO sales_reps VALUES
|
|
| 309 |
SELECT name, region, revenue
|
| 310 |
FROM (
|
| 311 |
SELECT name, region, revenue,
|
| 312 |
-
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue
|
| 313 |
FROM sales_reps
|
| 314 |
) ranked
|
| 315 |
WHERE rn = 1
|
| 316 |
ORDER BY region, name""",
|
| 317 |
error_message=(
|
| 318 |
-
"Query runs but returns only 2 rows
|
| 319 |
-
"
|
| 320 |
),
|
| 321 |
-
hint="
|
| 322 |
test_cases=[
|
| 323 |
TestCase(
|
| 324 |
description="All reps tied at rank 1 per region",
|
|
@@ -350,21 +348,21 @@ _TASK_EXPERT_RECURSIVE = SQLTask(
|
|
| 350 |
title="Traverse Org Chart with Recursive CTE",
|
| 351 |
description="""\
|
| 352 |
TASK: The query below attempts to find all subordinates of the VP of Engineering
|
| 353 |
-
(id=3)
|
| 354 |
-
deep. Rewrite it using a recursive CTE that traverses all levels.
|
| 355 |
|
| 356 |
SCHEMA:
|
| 357 |
employees(id INTEGER, name VARCHAR, manager_id INTEGER)
|
| 358 |
|
| 359 |
DATA (partial):
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
Lead
|
| 363 |
-
Dev 1 (id=8)
|
|
|
|
| 364 |
|
| 365 |
BROKEN QUERY:
|
| 366 |
WITH direct AS (
|
| 367 |
-
SELECT id, name, manager_id FROM employees WHERE
|
| 368 |
),
|
| 369 |
level2 AS (
|
| 370 |
SELECT e.id, e.name, e.manager_id
|
|
@@ -377,12 +375,13 @@ BROKEN QUERY:
|
|
| 377 |
ORDER BY id
|
| 378 |
|
| 379 |
PROBLEM:
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
|
| 383 |
|
| 384 |
-
GOAL:
|
| 385 |
-
|
|
|
|
| 386 |
schema_ddl="""\
|
| 387 |
CREATE TABLE employees (id INTEGER, name VARCHAR, manager_id INTEGER);
|
| 388 |
INSERT INTO employees VALUES
|
|
@@ -403,7 +402,7 @@ INSERT INTO employees VALUES
|
|
| 403 |
""",
|
| 404 |
broken_query="""\
|
| 405 |
WITH direct AS (
|
| 406 |
-
SELECT id, name, manager_id FROM employees WHERE
|
| 407 |
),
|
| 408 |
level2 AS (
|
| 409 |
SELECT e.id, e.name, e.manager_id
|
|
@@ -415,11 +414,10 @@ UNION ALL
|
|
| 415 |
SELECT id, name, manager_id FROM level2
|
| 416 |
ORDER BY id""",
|
| 417 |
error_message=(
|
| 418 |
-
"Query returns
|
| 419 |
-
"
|
| 420 |
-
"A hardcoded level3 CTE would fix this instance but not scale to deeper trees."
|
| 421 |
),
|
| 422 |
-
hint="
|
| 423 |
test_cases=[
|
| 424 |
TestCase(
|
| 425 |
description="All 8 subordinates of VP Eng at any depth",
|
|
@@ -456,34 +454,33 @@ ORDER BY id""",
|
|
| 456 |
_TASK_EXPERT_WINDOW = SQLTask(
|
| 457 |
id="task_expert_window",
|
| 458 |
level="expert",
|
| 459 |
-
title="Fix
|
| 460 |
description="""\
|
| 461 |
-
TASK: The query below computes a cumulative running total and a
|
| 462 |
-
|
| 463 |
-
are broken β neither has a PARTITION BY, so they treat all rows as one
|
| 464 |
-
giant partition instead of computing independently per region.
|
| 465 |
|
| 466 |
SCHEMA:
|
| 467 |
quarterly_sales(region VARCHAR, quarter INTEGER, revenue DECIMAL)
|
| 468 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 469 |
BROKEN QUERY:
|
| 470 |
SELECT region, quarter, revenue,
|
| 471 |
SUM(revenue) OVER (ORDER BY region, quarter) AS running_total,
|
| 472 |
-
RANK() OVER (ORDER BY revenue DESC)
|
| 473 |
FROM quarterly_sales
|
| 474 |
ORDER BY region, quarter
|
| 475 |
|
| 476 |
PROBLEM:
|
| 477 |
-
|
| 478 |
-
|
| 479 |
-
|
| 480 |
-
|
| 481 |
-
|
| 482 |
-
|
| 483 |
-
|
| 484 |
-
- running_total must reset to 0 at the start of each region (ORDER BY quarter).
|
| 485 |
-
- revenue_rank must rank revenue within each region (ORDER BY revenue DESC).
|
| 486 |
-
Both OVER clauses need PARTITION BY region, but with different ORDER BY columns.
|
| 487 |
Final output: ORDER BY region ASC, quarter ASC.""",
|
| 488 |
schema_ddl="""\
|
| 489 |
CREATE TABLE quarterly_sales (region VARCHAR, quarter INTEGER, revenue DECIMAL);
|
|
@@ -495,7 +492,7 @@ INSERT INTO quarterly_sales VALUES
|
|
| 495 |
('West', 1, 11000),
|
| 496 |
('West', 2, 14000),
|
| 497 |
('West', 3, 16000),
|
| 498 |
-
('West', 4,
|
| 499 |
""",
|
| 500 |
broken_query="""\
|
| 501 |
SELECT region, quarter, revenue,
|
|
@@ -504,19 +501,14 @@ SELECT region, quarter, revenue,
|
|
| 504 |
FROM quarterly_sales
|
| 505 |
ORDER BY region, quarter""",
|
| 506 |
error_message=(
|
| 507 |
-
"Query runs but both
|
| 508 |
-
"
|
| 509 |
-
"revenue_rank is a global ranking across all
|
| 510 |
-
"Both SUM and RANK are missing PARTITION BY region."
|
| 511 |
-
),
|
| 512 |
-
hint=(
|
| 513 |
-
"Add PARTITION BY region to BOTH window functions, but with different ORDER BY: "
|
| 514 |
-
"SUM(revenue) OVER (PARTITION BY region ORDER BY quarter) for running total, "
|
| 515 |
-
"RANK() OVER (PARTITION BY region ORDER BY revenue DESC) for within-region rank."
|
| 516 |
),
|
|
|
|
| 517 |
test_cases=[
|
| 518 |
TestCase(
|
| 519 |
-
description="Per-region running total and within-region revenue rank",
|
| 520 |
expected_rows=[
|
| 521 |
{"region": "East", "quarter": 1, "revenue": 15000.0, "running_total": 15000.0, "revenue_rank": 3},
|
| 522 |
{"region": "East", "quarter": 2, "revenue": 18000.0, "running_total": 33000.0, "revenue_rank": 2},
|
|
@@ -525,7 +517,7 @@ ORDER BY region, quarter""",
|
|
| 525 |
{"region": "West", "quarter": 1, "revenue": 11000.0, "running_total": 11000.0, "revenue_rank": 4},
|
| 526 |
{"region": "West", "quarter": 2, "revenue": 14000.0, "running_total": 25000.0, "revenue_rank": 3},
|
| 527 |
{"region": "West", "quarter": 3, "revenue": 16000.0, "running_total": 41000.0, "revenue_rank": 1},
|
| 528 |
-
{"region": "West", "quarter": 4, "revenue":
|
| 529 |
],
|
| 530 |
order_by="region,quarter",
|
| 531 |
)
|
|
|
|
| 270 |
level="expert",
|
| 271 |
title="Fix the Tie-Breaking Window Function",
|
| 272 |
description="""\
|
| 273 |
+
TASK: The query below attempts to find the top-earning sales rep per region,
|
| 274 |
+
but it returns wrong results. Debug it.
|
|
|
|
| 275 |
|
| 276 |
SCHEMA:
|
| 277 |
sales_reps(id INTEGER, name VARCHAR, region VARCHAR, revenue DECIMAL)
|
|
|
|
| 280 |
SELECT name, region, revenue
|
| 281 |
FROM (
|
| 282 |
SELECT name, region, revenue,
|
| 283 |
+
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue ASC) AS rn
|
| 284 |
FROM sales_reps
|
| 285 |
) ranked
|
| 286 |
WHERE rn = 1
|
| 287 |
ORDER BY region, name
|
| 288 |
|
| 289 |
PROBLEM:
|
| 290 |
+
The query returns 2 rows but the expected answer has 4.
|
| 291 |
+
The output values are also wrong β it seems to pick the lowest revenue per region
|
| 292 |
+
instead of the highest.
|
| 293 |
|
| 294 |
GOAL: Return ALL reps whose revenue is the highest in their region.
|
|
|
|
| 295 |
Order by region ASC, name ASC.""",
|
| 296 |
schema_ddl="""\
|
| 297 |
CREATE TABLE sales_reps (id INTEGER, name VARCHAR, region VARCHAR, revenue DECIMAL);
|
|
|
|
| 307 |
SELECT name, region, revenue
|
| 308 |
FROM (
|
| 309 |
SELECT name, region, revenue,
|
| 310 |
+
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue ASC) AS rn
|
| 311 |
FROM sales_reps
|
| 312 |
) ranked
|
| 313 |
WHERE rn = 1
|
| 314 |
ORDER BY region, name""",
|
| 315 |
error_message=(
|
| 316 |
+
"Query runs but returns wrong results: only 2 rows (one per region) "
|
| 317 |
+
"with the LOWEST revenue instead of the HIGHEST. Expected 4 rows."
|
| 318 |
),
|
| 319 |
+
hint="There are two bugs. Think about both the ranking function and the sort order.",
|
| 320 |
test_cases=[
|
| 321 |
TestCase(
|
| 322 |
description="All reps tied at rank 1 per region",
|
|
|
|
| 348 |
title="Traverse Org Chart with Recursive CTE",
|
| 349 |
description="""\
|
| 350 |
TASK: The query below attempts to find all subordinates of the VP of Engineering
|
| 351 |
+
(id=3), but it returns wrong results. Debug and fix it.
|
|
|
|
| 352 |
|
| 353 |
SCHEMA:
|
| 354 |
employees(id INTEGER, name VARCHAR, manager_id INTEGER)
|
| 355 |
|
| 356 |
DATA (partial):
|
| 357 |
+
CEO (id=1)
|
| 358 |
+
VP Eng (id=3, reports to CEO)
|
| 359 |
+
Lead A (id=5), Lead B (id=6) report to VP Eng
|
| 360 |
+
Dev 1..4 (id=8..11) report to Leads
|
| 361 |
+
Junior 1..2 (id=13..14) report to Dev 1
|
| 362 |
|
| 363 |
BROKEN QUERY:
|
| 364 |
WITH direct AS (
|
| 365 |
+
SELECT id, name, manager_id FROM employees WHERE id = 3
|
| 366 |
),
|
| 367 |
level2 AS (
|
| 368 |
SELECT e.id, e.name, e.manager_id
|
|
|
|
| 375 |
ORDER BY id
|
| 376 |
|
| 377 |
PROBLEM:
|
| 378 |
+
The query returns some results but the row count and values don't match
|
| 379 |
+
the expected output. Inspect what the anchor condition selects and whether
|
| 380 |
+
the query reaches all depths of the org tree.
|
| 381 |
|
| 382 |
+
GOAL: Return ALL 8 subordinates of VP Eng (id=3) at any depth.
|
| 383 |
+
Do NOT include VP Eng himself β only his reports.
|
| 384 |
+
Return id, name, manager_id columns, ordered by id ASC.""",
|
| 385 |
schema_ddl="""\
|
| 386 |
CREATE TABLE employees (id INTEGER, name VARCHAR, manager_id INTEGER);
|
| 387 |
INSERT INTO employees VALUES
|
|
|
|
| 402 |
""",
|
| 403 |
broken_query="""\
|
| 404 |
WITH direct AS (
|
| 405 |
+
SELECT id, name, manager_id FROM employees WHERE id = 3
|
| 406 |
),
|
| 407 |
level2 AS (
|
| 408 |
SELECT e.id, e.name, e.manager_id
|
|
|
|
| 414 |
SELECT id, name, manager_id FROM level2
|
| 415 |
ORDER BY id""",
|
| 416 |
error_message=(
|
| 417 |
+
"Query returns wrong results. Check carefully: does the anchor condition "
|
| 418 |
+
"select the right starting rows? Does the query traverse all depths?"
|
|
|
|
| 419 |
),
|
| 420 |
+
hint="There are multiple issues. Think about what the anchor selects and how deep the query reaches.",
|
| 421 |
test_cases=[
|
| 422 |
TestCase(
|
| 423 |
description="All 8 subordinates of VP Eng at any depth",
|
|
|
|
| 454 |
_TASK_EXPERT_WINDOW = SQLTask(
|
| 455 |
id="task_expert_window",
|
| 456 |
level="expert",
|
| 457 |
+
title="Fix Broken Window Functions: Running Total and Revenue Rank",
|
| 458 |
description="""\
|
| 459 |
+
TASK: The query below computes a cumulative running total and a within-region
|
| 460 |
+
revenue rank for each quarter, but the results are wrong. Debug and fix it.
|
|
|
|
|
|
|
| 461 |
|
| 462 |
SCHEMA:
|
| 463 |
quarterly_sales(region VARCHAR, quarter INTEGER, revenue DECIMAL)
|
| 464 |
|
| 465 |
+
DATA:
|
| 466 |
+
East: Q1=15000, Q2=18000, Q3=12000, Q4=20000
|
| 467 |
+
West: Q1=11000, Q2=14000, Q3=16000, Q4=16000 (note: Q3 and Q4 are tied)
|
| 468 |
+
|
| 469 |
BROKEN QUERY:
|
| 470 |
SELECT region, quarter, revenue,
|
| 471 |
SUM(revenue) OVER (ORDER BY region, quarter) AS running_total,
|
| 472 |
+
RANK() OVER (ORDER BY revenue DESC) AS revenue_rank
|
| 473 |
FROM quarterly_sales
|
| 474 |
ORDER BY region, quarter
|
| 475 |
|
| 476 |
PROBLEM:
|
| 477 |
+
The query returns wrong values for both running_total and revenue_rank.
|
| 478 |
+
Compare your output against the expected results carefully.
|
| 479 |
+
|
| 480 |
+
GOAL: running_total should be a cumulative sum per region (reset each region,
|
| 481 |
+
ordered by quarter). revenue_rank should rank revenue within each region
|
| 482 |
+
(ordered by revenue DESC), handling ties correctly (tied values must get
|
| 483 |
+
the same rank).
|
|
|
|
|
|
|
|
|
|
| 484 |
Final output: ORDER BY region ASC, quarter ASC.""",
|
| 485 |
schema_ddl="""\
|
| 486 |
CREATE TABLE quarterly_sales (region VARCHAR, quarter INTEGER, revenue DECIMAL);
|
|
|
|
| 492 |
('West', 1, 11000),
|
| 493 |
('West', 2, 14000),
|
| 494 |
('West', 3, 16000),
|
| 495 |
+
('West', 4, 16000);
|
| 496 |
""",
|
| 497 |
broken_query="""\
|
| 498 |
SELECT region, quarter, revenue,
|
|
|
|
| 501 |
FROM quarterly_sales
|
| 502 |
ORDER BY region, quarter""",
|
| 503 |
error_message=(
|
| 504 |
+
"Query runs but both computed columns are wrong. "
|
| 505 |
+
"running_total does not reset per region. "
|
| 506 |
+
"revenue_rank is a global ranking across all rows instead of per-region."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 507 |
),
|
| 508 |
+
hint="Multiple issues exist. Think about partitioning and how tied values should be ranked.",
|
| 509 |
test_cases=[
|
| 510 |
TestCase(
|
| 511 |
+
description="Per-region running total and within-region revenue rank with ties",
|
| 512 |
expected_rows=[
|
| 513 |
{"region": "East", "quarter": 1, "revenue": 15000.0, "running_total": 15000.0, "revenue_rank": 3},
|
| 514 |
{"region": "East", "quarter": 2, "revenue": 18000.0, "running_total": 33000.0, "revenue_rank": 2},
|
|
|
|
| 517 |
{"region": "West", "quarter": 1, "revenue": 11000.0, "running_total": 11000.0, "revenue_rank": 4},
|
| 518 |
{"region": "West", "quarter": 2, "revenue": 14000.0, "running_total": 25000.0, "revenue_rank": 3},
|
| 519 |
{"region": "West", "quarter": 3, "revenue": 16000.0, "running_total": 41000.0, "revenue_rank": 1},
|
| 520 |
+
{"region": "West", "quarter": 4, "revenue": 16000.0, "running_total": 57000.0, "revenue_rank": 1},
|
| 521 |
],
|
| 522 |
order_by="region,quarter",
|
| 523 |
)
|