Instructions to use Georgefifth/tiny-browser-planner-reason with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Georgefifth/tiny-browser-planner-reason with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("./model/MiniCPM5-1B") model = PeftModel.from_pretrained(base_model, "Georgefifth/tiny-browser-planner-reason") - Notebooks
- Google Colab
- Kaggle
TinyBrowserPlanner-Reason
A 1B model can explain the correct browser action before it can reliably choose it.
This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.
The original goal was simple:
Can a 1B model decide the next browser action given a task and an observation?
Actions include:
- search
- open_page
- extract
- back
- finish
- refine_search
Key Findings
1. Data quality beats data quantity
Adding large amounts of similar trajectory data produced almost no improvement.
However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.
2. Adding actions creates both capability and confusion
Introducing the back action allowed the model to recover from wrong pages and paywalls.
However, the model quickly learned to overuse back as a universal solution.
3. Reason-First training dramatically improves planning
Action-only planning:
4/12
Reason-First planning:
10/12
Using only 40 reasoning examples and less than 10 seconds of additional training.
The most important result:
The model already understood the state of the environment.
It failed because it learned shortcut action heuristics.
Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.
Example
Task:
Find Apple stock price
Observation:
Price displayed prominently on page.
Reason:
The requested information is already available.
Action:
extract
Task:
Find CEO of OpenAI
Observation:
Page discusses Microsoft CEO.
Reason:
The page is irrelevant to the requested information.
Action:
back
Training
Base model:
openbmb/MiniCPM5-1B
Method:
LoRA fine-tuning
Framework:
Unsloth + PEFT
Limitations
The model performs well on simple browser planning and replanning scenarios.
However, it still struggles with:
- multi-step recovery chains
- long-horizon planning
- complex search strategy generation
- comparison tasks requiring multiple sources
Conclusion
This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.
A 1B model can often explain the correct action before it can reliably choose it.
This repository contains only the LoRA adapter.
The base model must be downloaded separately.
- Downloads last month
- 38
Model tree for Georgefifth/tiny-browser-planner-reason
Base model
openbmb/MiniCPM5-1B