WebDreamer: Model-Based Planning for Web Agents

WebDreamer is a planning framework that enables efficient and effective planning for real-world web agent tasks. Check our paper for more details. This work is a collaboration between OSUNLP and Orby AI.

image

Models

Results

Strong performance on VisualWebArena and Mind2Web-live

Benchmark Method Success Rate
VisualWebArena GPT-4o + Reactive 17.6%
GPT-4o + Tree Search 26.2%
GPT-4o + WebDreamer 23.6% (โ†‘34.1%)
Online-Mind2Web GPT-4o + Reactive 26.0%
GPT-4o + WebDreamer 37.0% (โ†‘42.3%)
Mind2Web-live GPT-4o + Reactive 20.2%
GPT-4o + WebDreamer 25.0% (โ†‘23.8%)

Compared to the reactive baselines, WebDreamer significantly improves performance by 34.1%, 42.3%, and 23.8% on VisualWebArena, Online-Mind2Web, and Mind2Web-live, respectively.

Better efficiency than tree search with true interactions

image

WebDreamer effectively explores the search space through simulations, which largely reduces the reliance on real-world interactions while maintaining robust performance.

Inference

vLLM server

vllm serve osunlp/Dreamer-7B --api-key token-abc123 --dtype float16

or

python -m vllm.entrypoints.openai.api_server --served-model-name osunlp/Dreamer-7B --model osunlp/Dreamer-7B --dtype float16 

You can find more instruction about training and inference in Qwen2-VL's Official Repo.

Prompt

Actually our model is quite robust to textual prompt so feel free to try various prompts which we didn't heavily explore.

def format_openai_template(description: str, base64_image):
    return [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {
                    "type": "text",
                    "text": f"""
  Below is current screenshot. Please describe what you would see after a {action_description}"""
                },
            ],
        },
    ]


messages = format_openai_template(description, base64_image)

completion = await client.chat.completions.create(
    model=args.model_path,
    messages=messages,
    temperature=1.0
)

Citation Information

If you find this work useful, please consider citing our papers:

@article{Gu2024WebDreamer,
  author    = {Yu Gu and Kai Zhang and Yuting Ning and Boyuan Zheng and Boyu Gou and Tianci Xue and Cheng Chang and Sanjari Srivastava and Yanan Xie and Peng Qi and Huan Sun and Yu Su},
  title     = {Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents},
  journal   = {CoRR},
  volume    = {abs/2411.06559},
  year      = {2024},
  url       = {https://arxiv.org/abs/2411.06559},
  eprinttype= {arXiv},
  eprint    = {2411.06559},
}
Downloads last month
3
Safetensors
Model size
8.29B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for osunlp/Dreamer-7B-Reddit

Base model

Qwen/Qwen2-VL-7B
Finetuned
(12)
this model
Quantizations
1 model