OpenEnv documentation
RL Framework Integration
Get Started
Guides
Tutorials
OverviewHello WorldTrain a reasoning model with TRLMCP EnvironmentsRubricsRL Training with TRLRL Training with UnslothEvaluating with Inspect AIRL Training with an Agentic HarnessSFT Training with Environments
Environments
API Reference
Project
RL Framework Integration
This page is still being filled in. TRL integration is covered below; torchforge and SkyRL integrations are planned.
Use OpenEnv with popular RL frameworks like TRL, torchforge, and SkyRL.
Overview
OpenEnv environments are designed to integrate seamlessly with RL training frameworks. The standard step(), reset(), state() API makes it easy to use environments in training loops.
TRL Integration
TRL (Transformer Reinforcement Learning) is the recommended framework for training language models with RL.
from trl import GRPOTrainer
from openenv import AutoEnv, AutoAction
env = AutoEnv.from_env("textarena")
TextAction = AutoAction.from_env("textarena")
# Use with TRL's GRPO trainer
trainer = GRPOTrainer(
model=model,
reward_model=reward_model,
# ... TRL config
)See the Wordle with GRPO tutorial for a complete example.
Generic Training Loop
For custom training setups:
from openenv import AutoEnv, AutoAction
env = AutoEnv.from_env("my-env")
Action = AutoAction.from_env("my-env")
with env.sync() as client:
for episode in range(num_episodes):
result = client.reset()
while not result.terminated:
# Get action from your policy
action = policy(result.observation)
# Take step
result = client.step(action)
# Update policy with reward
policy.update(result.reward)Next Steps
- Reward Design - Design effective reward functions
- Wordle with GRPO - Complete TRL example