TinyBrowserPlanner-Reason

A 1B model can explain the correct browser action before it can reliably choose it.

This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.

The original goal was simple:

Can a 1B model decide the next browser action given a task and an observation?

Actions include:

search
open_page
extract
back
finish
refine_search

Key Findings

1. Data quality beats data quantity

Adding large amounts of similar trajectory data produced almost no improvement.

However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.

2. Adding actions creates both capability and confusion

Introducing the back action allowed the model to recover from wrong pages and paywalls.

However, the model quickly learned to overuse back as a universal solution.

3. Reason-First training dramatically improves planning

Action-only planning:

4/12

Reason-First planning:

10/12

Using only 40 reasoning examples and less than 10 seconds of additional training.

The most important result:

The model already understood the state of the environment.

It failed because it learned shortcut action heuristics.

Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.

Example

Task:

Find Apple stock price

Observation:

Price displayed prominently on page.

Reason:

The requested information is already available.

Action:

extract

Task:

Find CEO of OpenAI

Observation:

Page discusses Microsoft CEO.

Reason:

The page is irrelevant to the requested information.

Action:

back

Training

Base model:

openbmb/MiniCPM5-1B

Method:

LoRA fine-tuning

Framework:

Unsloth + PEFT

Limitations

The model performs well on simple browser planning and replanning scenarios.

However, it still struggles with:

multi-step recovery chains
long-horizon planning
complex search strategy generation
comparison tasks requiring multiple sources

Conclusion

This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.

A 1B model can often explain the correct action before it can reliably choose it.

This repository contains only the LoRA adapter.

The base model must be downloaded separately.

Downloads last month: 38

Model tree for Georgefifth/tiny-browser-planner-reason

Base model