OpenEnv documentation

BrowserGym Harness Rollouts

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

BrowserGym Harness Rollouts

This tutorial shows how to drive BrowserGym through the OpenEnv harness runtime when a trainer needs to keep token sampling, logprobs, and reward assignment inside the training loop.

Use this pattern for tool-driven BrowserGym rollouts. For a standard reset() / step() GRPO flow, keep using the Wordle GRPO tutorial.

What You’ll Build

  • A BrowserGym session factory that creates one environment client per rollout.
  • A harness rollout function that TRL can call during training.
  • A model-step wrapper that converts generated BrowserGym action text into structured tool calls.

Install Dependencies

Install OpenEnv, TRL, and the BrowserGym environment package:

pip install -U "trl[vllm]" peft trackio kernels
pip install -U git+https://github.com/huggingface/OpenEnv.git
pip install -U "openenv-browsergym @ git+https://huggingface.co/spaces/openenv/browsergym_env"

Build The Session Factory

BrowserGymSessionFactory adapts a BrowserGym client into the harness ResourceSession interface. If your training setup already has an environment_factory, pass that factory as client_factory so every rollout gets a fresh environment instance.

from browsergym_env import BrowserGymEnv
from browsergym_env.harness import BrowserGymSessionFactory

space_url = "https://openenv-browsergym-env.hf.space"


def environment_factory():
    return BrowserGymEnv(base_url=space_url)


session_factory = BrowserGymSessionFactory(
    client_factory=environment_factory,
    default_task="click-test",
)

The session exposes BrowserGym actions such as click, fill, send_keys, scroll, and noop as MCP-style tools while still executing the corresponding BrowserGym action strings under the hood.

Wrap TRL Generation

The harness calls a model_step function for each turn. The model step should use the trainer-owned generation path, then return a ModelStepResult with the completion text, token ids, logprobs, and exactly one BrowserGym tool call.

from browsergym_env.harness import build_browsergym_action_tool_call
from openenv.core.harness import ModelStepResult
from openenv.core.llm_client import LLMResponse
from trl.experimental.openenv import generate_rollout_completions


def build_trl_browsergym_model_step(trainer, tokenizer):
    def model_step(messages, tools, sampling):
        del tools, sampling
        prompt_text = tokenizer.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=False,
        )
        rollout_output = generate_rollout_completions(trainer, [prompt_text])[0]
        completion_text = rollout_output.get("text") or tokenizer.decode(
            rollout_output["completion_ids"],
            skip_special_tokens=True,
        )
        tool_call = build_browsergym_action_tool_call(completion_text)
        return ModelStepResult(
            response=LLMResponse(content=completion_text, tool_calls=[tool_call]),
            prompt_ids=list(rollout_output["prompt_ids"]),
            completion_ids=list(rollout_output["completion_ids"]),
            logprobs=list(rollout_output["logprobs"]),
        )

    return model_step

In practice, you should add a small parser around the completion text so common outputs like Action: click('13') are normalized before calling build_browsergym_action_tool_call.

Create The Rollout Function

Pass the session factory, white-box harness adapter, and model-step builder to build_harness_rollout_func:

from openenv.core.harness import (
    HarnessRunLimits,
    MCPHarnessAdapter,
    build_harness_rollout_func,
)

rollout_func = build_harness_rollout_func(
    session_factory=session_factory,
    harness_adapter=MCPHarnessAdapter(),
    model_step_builder=lambda trainer, session: build_trl_browsergym_model_step(
        trainer,
        tokenizer,
    ),
    limits=HarnessRunLimits(max_turns=10),
)

The returned function accepts TRL prompts and a trainer, runs one harness-backed BrowserGym episode per prompt, and returns prompt_ids, completion_ids, logprobs, env_reward, and verify_metrics.

Full Example

See examples/browsergym_harness.py for a complete TRL-oriented helper that includes action normalization and a ready-to-use build_browsergym_rollout_func.

Update on GitHub