OpenEnv documentation
BrowserGym Harness Rollouts
BrowserGym Harness Rollouts
This tutorial shows how to drive BrowserGym through the OpenEnv harness runtime when a trainer needs to keep token sampling, logprobs, and reward assignment inside the training loop.
Use this pattern for tool-driven BrowserGym rollouts. For a standard
reset()/step()GRPO flow, keep using the Wordle GRPO tutorial.
What You’ll Build
- A BrowserGym session factory that creates one environment client per rollout.
- A harness rollout function that TRL can call during training.
- A model-step wrapper that converts generated BrowserGym action text into structured tool calls.
Install Dependencies
Install OpenEnv, TRL, and the BrowserGym environment package:
pip install -U "trl[vllm]" peft trackio kernels
pip install -U git+https://github.com/huggingface/OpenEnv.git
pip install -U "openenv-browsergym @ git+https://huggingface.co/spaces/openenv/browsergym_env"Build The Session Factory
BrowserGymSessionFactory adapts a BrowserGym client into the harness
ResourceSession interface. If your training setup already has an
environment_factory, pass that factory as client_factory so every rollout
gets a fresh environment instance.
from browsergym_env import BrowserGymEnv
from browsergym_env.harness import BrowserGymSessionFactory
space_url = "https://openenv-browsergym-env.hf.space"
def environment_factory():
return BrowserGymEnv(base_url=space_url)
session_factory = BrowserGymSessionFactory(
client_factory=environment_factory,
default_task="click-test",
)The session exposes BrowserGym actions such as click, fill, send_keys,
scroll, and noop as MCP-style tools while still executing the corresponding
BrowserGym action strings under the hood.
Wrap TRL Generation
The harness calls a model_step function for each turn. The model step should
use the trainer-owned generation path, then return a ModelStepResult with the
completion text, token ids, logprobs, and exactly one BrowserGym tool call.
from browsergym_env.harness import build_browsergym_action_tool_call
from openenv.core.harness import ModelStepResult
from openenv.core.llm_client import LLMResponse
from trl.experimental.openenv import generate_rollout_completions
def build_trl_browsergym_model_step(trainer, tokenizer):
def model_step(messages, tools, sampling):
del tools, sampling
prompt_text = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)
rollout_output = generate_rollout_completions(trainer, [prompt_text])[0]
completion_text = rollout_output.get("text") or tokenizer.decode(
rollout_output["completion_ids"],
skip_special_tokens=True,
)
tool_call = build_browsergym_action_tool_call(completion_text)
return ModelStepResult(
response=LLMResponse(content=completion_text, tool_calls=[tool_call]),
prompt_ids=list(rollout_output["prompt_ids"]),
completion_ids=list(rollout_output["completion_ids"]),
logprobs=list(rollout_output["logprobs"]),
)
return model_stepIn practice, you should add a small parser around the completion text so common
outputs like Action: click('13') are normalized before calling
build_browsergym_action_tool_call.
Create The Rollout Function
Pass the session factory, white-box harness adapter, and model-step builder to
build_harness_rollout_func:
from openenv.core.harness import (
HarnessRunLimits,
MCPHarnessAdapter,
build_harness_rollout_func,
)
rollout_func = build_harness_rollout_func(
session_factory=session_factory,
harness_adapter=MCPHarnessAdapter(),
model_step_builder=lambda trainer, session: build_trl_browsergym_model_step(
trainer,
tokenizer,
),
limits=HarnessRunLimits(max_turns=10),
)The returned function accepts TRL prompts and a trainer, runs one harness-backed
BrowserGym episode per prompt, and returns prompt_ids, completion_ids,
logprobs, env_reward, and verify_metrics.
Full Example
See examples/browsergym_harness.py
for a complete TRL-oriented helper that includes action normalization and a
ready-to-use build_browsergym_rollout_func.