sql_env / docs /competition-deliverables.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

OpenEnv Challenge — Deliverables & Status

Competition

OpenEnv Challenge: SOTA Environments to drive general intelligence

Sponsors: PyTorch team at Meta, HuggingFace, Unsloth

Prizes:

  • $10K in HuggingFace credits
  • Invitation to publish on PyTorch.org blog

Judging Criteria

Evaluated primarily on the submission blog. Judging panel grades on:

  1. Creative and robust use of OpenEnv
  2. Technical excellence
  3. Storytelling
  4. Open-source demo
  5. Green Agent wrapper for the environment

Required Deliverables

1. HuggingFace Space

Environment on the HF Hub. Judges interact with the action space (DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases.

Live at: https://huggingface.co/spaces/hjerpe/sql_env Docker image: registry.hf.space/hjerpe-sql_env:latest Published via uv run openenv push on 2026-03-29 (see specs/F007-DEMO.md).

Status: Live. Endpoints /health, /docs, /web, /reset, /step, /ws exposed by the FastAPI server in envs/sql_env/server/. Python client: SQLEnv(base_url="https://hjerpe-sql-env.hf.space").

2. Training notebooks/scripts (GitHub)

Colab-ready notebooks:

  • notebooks/train_grpo.ipynb — Full SFT + GRPO pipeline, Colab L4, ~7h
  • notebooks/compare_methods.ipynb — Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2)
  • notebooks/showcase_sqlenv.ipynb — Interactive environment demo with Random and Oracle baselines

Status: Complete

3. Blog post (HuggingFace)

Analyst exploration framing, reward architecture with theory, training results (0% to ~30%), failure analysis, lessons learned.

Draft: docs/blog-post-v1.md

Status: Draft v1 complete, not yet published

Additional Deliverables

4. GitHub repo

Clean codebase: zero ruff errors, typed Pydantic models, 280 passing tests, architecture docs, training artifacts.

Status: Complete (F016 quality sweep done)

5. Trained checkpoints (HuggingFace Hub)

  • hjerpe/sqlenv-qwen3-0.6b-grpo (v1)
  • hjerpe/sqlenv-qwen3-0.6b-grpo-v2 (v2)

Status: Uploaded

6. Green Agent wrapper

OpenEnv evaluation wrapper pattern. A Policy protocol with evaluate(env, policy, n_episodes, seed) that reports success rate, average reward, and average steps. Includes RandomPolicy and OraclePolicy baselines for standardized comparison.

Implementation: evaluation/policies.py, evaluation/oracle_policy.py Tests: tests/test_evaluation.py (17 tests, all passing) Used by: notebooks/showcase_sqlenv.ipynb, notebooks/compare_methods.ipynb

Status: Complete

7. TRL environment_factory adapter

HuggingFace TRL's native OpenEnv integration: pass a class with reset() + named tool methods as environment_factory= and GRPOTrainer runs the multi-turn tool-calling loop automatically (no custom rollout_func).

Implementation: training/trl_adapter.py — class SQLEnvTRL exposing describe(), sample(), query(), answer() as tool methods plus sql_env_reward_func. Used by notebooks/train_grpo.ipynb (cell 16: environment_factory=SQLEnvTRL).

Note: the adapter instantiates a local in-process SQLEnvironment, not a WebSocket client to the hosted HF Space. Intentional — training needs N parallel sessions (one per generation), and local is faster and avoids the Space's default 1-session concurrency limit.

Status: Complete

Our Position

No interactive SQL exploration environment exists. SQL Repair (WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is real-world but not SQL. We are the only multi-turn strategy-discovery environment for database exploration.

Key narrative: "The environment is the product." The trained agent demonstrates that the environment works, but the contribution is the action space, reward architecture, and episode structure.

Open Items

  • Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29)
  • Publish blog post on HuggingFace (planned 2026-04-12)
  • Final review of blog-post-v1.md
  • Verify notebooks run clean on fresh Colab
  • Post-launch: enable SUPPORTS_CONCURRENT_SESSIONS=True + max_concurrent_envs=64 on the Space for external users who want to retrain against the hosted endpoint

Resources