BrowserGym Harness Rollouts

This tutorial shows how to drive BrowserGym through the OpenEnv harness runtime when a trainer needs to keep token sampling, logprobs, and reward assignment inside the training loop.

Use this pattern for tool-driven BrowserGym rollouts. For a standard reset() / step() GRPO flow, keep using the Wordle GRPO tutorial.

What You’ll Build

A BrowserGym session factory that creates one environment client per rollout.
A harness rollout function that TRL can call during training.
A model-step wrapper that converts generated BrowserGym action text into structured tool calls.

Install Dependencies

Install OpenEnv, TRL, and the BrowserGym environment package:

pip install -U "trl[vllm]" peft trackio kernels
pip install -U git+https://github.com/huggingface/OpenEnv.git
pip install -U "openenv-browsergym @ git+https://huggingface.co/spaces/openenv/browsergym_env"

Build The Session Factory

BrowserGymSessionFactory adapts a BrowserGym client into the harness ResourceSession interface. If your training setup already has an environment_factory, pass that factory as client_factory so every rollout gets a fresh environment instance.

from browsergym_env import BrowserGymEnv
from browsergym_env.harness import BrowserGymSessionFactory

space_url = "https://openenv-browsergym-env.hf.space"


def environment_factory():
    return BrowserGymEnv(base_url=space_url)


session_factory = BrowserGymSessionFactory(
    client_factory=environment_factory,
    default_task="click-test",
)

The session exposes BrowserGym actions such as click, fill, send_keys, scroll, and noop as MCP-style tools while still executing the corresponding BrowserGym action strings under the hood.

Wrap TRL Generation

The harness calls a model_step function for each turn. The model step should use the trainer-owned generation path, then return a ModelStepResult with the completion text, token ids, logprobs, and exactly one BrowserGym tool call.

from browsergym_env.harness import build_browsergym_action_tool_call
from openenv.core.harness import ModelStepResult
from openenv.core.llm_client import LLMResponse
from trl.experimental.openenv import generate_rollout_completions


def build_trl_browsergym_model_step(trainer, tokenizer):
    def model_step(messages, tools, sampling):
        del tools, sampling
        prompt_text = tokenizer.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=False,
        )
        rollout_output = generate_rollout_completions(trainer, [prompt_text])[0]
        completion_text = rollout_output.get("text") or tokenizer.decode(
            rollout_output["completion_ids"],
            skip_special_tokens=True,
        )
        tool_call = build_browsergym_action_tool_call(completion_text)
        return ModelStepResult(
            response=LLMResponse(content=completion_text, tool_calls=[tool_call]),
            prompt_ids=list(rollout_output["prompt_ids"]),
            completion_ids=list(rollout_output["completion_ids"]),
            logprobs=list(rollout_output["logprobs"]),
        )

    return model_step

In practice, you should add a small parser around the completion text so common outputs like Action: click('13') are normalized before calling build_browsergym_action_tool_call.

Create The Rollout Function

Pass the session factory, white-box harness adapter, and model-step builder to build_harness_rollout_func:

from openenv.core.harness import (
    HarnessRunLimits,
    MCPHarnessAdapter,
    build_harness_rollout_func,
)

rollout_func = build_harness_rollout_func(
    session_factory=session_factory,
    harness_adapter=MCPHarnessAdapter(),
    model_step_builder=lambda trainer, session: build_trl_browsergym_model_step(
        trainer,
        tokenizer,
    ),
    limits=HarnessRunLimits(max_turns=10),
)

The returned function accepts TRL prompts and a trainer, runs one harness-backed BrowserGym episode per prompt, and returns prompt_ids, completion_ids, logprobs, env_reward, and verify_metrics.

Full Example

See examples/browsergym_harness.py for a complete TRL-oriented helper that includes action normalization and a ready-to-use build_browsergym_rollout_func.

Update on GitHub