OpenEnv documentation
Tutorials
Get Started
Guides
Tutorials
OverviewHello WorldTrain a reasoning model with TRLMCP EnvironmentsRubricsRL Training with TRLRL Training with UnslothEvaluating with Inspect AIRL Training with an Agentic HarnessSFT Training with Environments
Environments
EnvironmentsEchoCodingJupyterTerminusCoding ToolsChatAtariOpenSpielSUMO-RLFinRLTextArenaGitDIPG SafetySnakeWeb SearchBrowserGymREPLCalendarCARLAChessConnect4DM ControlFinQAGrid WorldJuliaKernRLMazeOpenAppReasoning GymTBench2UnityWildfireAgent World ModelOpenCode
API Reference
Project
Tutorials
New to OpenEnv? Start Here
The Getting Started Series walks you from zero to deploying your own environment in five short parts. No GPU required.
| Part | What it covers | Notebook |
|---|---|---|
| 1 — Introduction & Quick Start | What OpenEnv is, why it exists, and your first environment in under 10 minutes | |
| 2 — Using Environments | Connect to environments, create policies, run evaluations | |
| 3 — Building Environments | Create a custom environment from scratch | |
| 4 — Packaging & Deploying | Package with Docker and deploy to Hugging Face | — |
| 5 — Contributing to Hugging Face | Publish, fork, and share environments on the Hub | — |
Topic Tutorials
Already familiar with the basics? These tutorials cover specific workflows in depth.
| Tutorial | What it covers | GPU | Notebook |
|---|---|---|---|
| OpenEnv Tutorial | Full introduction to OpenEnv: install, connect to a hosted environment, step through an episode, define a reward function, and run a basic training loop. | No | |
| End-to-end walkthrough | The full pipeline: connect to reasoning_gym, wire it into TRL via environment_factory, fine-tune with GRPO, and push the checkpoint to the Hub. | Yes | |
| Building and using MCP environments | Consume and build MCP-backed environments: list and call tools through step(), register Python functions as tools with FastMCP. | No | |
| Rubrics | Compose reward functions from reusable pieces using Gate, WeightedSum, LLMJudge, and TrajectoryRubric. | No | |
| Wordle GRPO | Train an agent to play Wordle using GRPO via TRL’s environment_factory. | Yes | |
| RL Training with 2048 | Train a language model to play 2048 using GRPO. Covers game-state representation and reward shaping. | Yes | — |
| Evaluating agents with Inspect AI | Wrap an OpenEnv environment in an Inspect AI Task, run it via InspectAIHarness, and get a structured EvalResult. | No | |
| BrowserGym Harness Rollouts | Drive BrowserGym through the OpenEnv harness runtime when a trainer needs token sampling, logprobs, and reward assignment inside the training loop. | Yes | — |
| Collecting rollouts for supervised training | Run a teacher model to collect reward-labeled rollouts, filter them, and fine-tune a student with TRL’s SFTTrainer as a warm-start for GRPO. | Yes |