Spaces:

hjerpe
/

sql_env

Running

Live at: https://huggingface.co/spaces/hjerpe/sql_env Docker image: registry.hf.space/hjerpe-sql_env:latest Published via uv run openenv push on 2026-03-29 (see specs/F007-DEMO.md).

Status: Live. Endpoints /health, /docs, /web, /reset, /step, /ws exposed by the FastAPI server in envs/sql_env/server/. Python client: SQLEnv(base_url="https://hjerpe-sql-env.hf.space").

2. Training notebooks/scripts (GitHub)

Colab-ready notebooks:

notebooks/train_grpo.ipynb — Full SFT + GRPO pipeline, Colab L4, ~7h
notebooks/compare_methods.ipynb — Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2)
notebooks/showcase_sqlenv.ipynb — Interactive environment demo with Random and Oracle baselines

Status: Complete

3. Blog post (HuggingFace)

Analyst exploration framing, reward architecture with theory, training results (0% to ~30%), failure analysis, lessons learned.

Draft: docs/blog-post-v1.md

Status: Draft v1 complete, not yet published

Additional Deliverables

4. GitHub repo

Clean codebase: zero ruff errors, typed Pydantic models, 280 passing tests, architecture docs, training artifacts.

Status: Complete (F016 quality sweep done)

5. Trained checkpoints (HuggingFace Hub)

hjerpe/sqlenv-qwen3-0.6b-grpo (v1)
hjerpe/sqlenv-qwen3-0.6b-grpo-v2 (v2)

Status: Uploaded

6. Green Agent wrapper

OpenEnv evaluation wrapper pattern. A Policy protocol with evaluate(env, policy, n_episodes, seed) that reports success rate, average reward, and average steps. Includes RandomPolicy and OraclePolicy baselines for standardized comparison.

Implementation: evaluation/policies.py, evaluation/oracle_policy.py Tests: tests/test_evaluation.py (17 tests, all passing) Used by: notebooks/showcase_sqlenv.ipynb, notebooks/compare_methods.ipynb

Status: Complete

7. TRL `environment_factory` adapter

HuggingFace TRL's native OpenEnv integration: pass a class with reset() + named tool methods as environment_factory= and GRPOTrainer runs the multi-turn tool-calling loop automatically (no custom rollout_func).

Implementation: training/trl_adapter.py — class SQLEnvTRL exposing describe(), sample(), query(), answer() as tool methods plus sql_env_reward_func. Used by notebooks/train_grpo.ipynb (cell 16: environment_factory=SQLEnvTRL).

Note: the adapter instantiates a local in-process SQLEnvironment, not a WebSocket client to the hosted HF Space. Intentional — training needs N parallel sessions (one per generation), and local is faster and avoids the Space's default 1-session concurrency limit.

Status: Complete

Our Position

No interactive SQL exploration environment exists. SQL Repair (WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is real-world but not SQL. We are the only multi-turn strategy-discovery environment for database exploration.

Key narrative: "The environment is the product." The trained agent demonstrates that the environment works, but the contribution is the action space, reward architecture, and episode structure.

Open Items

Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29)
Publish blog post on HuggingFace (planned 2026-04-12)
Final review of blog-post-v1.md
Verify notebooks run clean on fresh Colab
Post-launch: enable SUPPORTS_CONCURRENT_SESSIONS=True + max_concurrent_envs=64 on the Space for external users who want to retrain against the hosted endpoint

Resources

OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb
OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv
OpenEnv docs: https://meta-pytorch.org/OpenEnv/
Environment hub: https://huggingface.co/openenv
Discord: https://discord.com/invite/YsTYBh6PD9

OpenEnv Challenge — Deliverables & Status

Competition

Judging Criteria

Required Deliverables

1. HuggingFace Space

2. Training notebooks/scripts (GitHub)

3. Blog post (HuggingFace)

Additional Deliverables

4. GitHub repo

5. Trained checkpoints (HuggingFace Hub)

6. Green Agent wrapper

7. TRL environment_factory adapter

Our Position

Open Items

Resources

7. TRL `environment_factory` adapter