OFIR SHIN's picture
1 1

OFIR SHIN

ofirst
ยท

AI & ML interests

None yet

Recent Activity

Organizations

None yet

ofirst's activity

New activity in microsoft/OmniParser 11 days ago
Reacted to merve's post with ๐Ÿ”ฅ 11 days ago
view post
Post
3455
Microsoft released a groundbreaking model that can be used for web automation, with MIT license ๐Ÿ”ฅ microsoft/OmniParser

Interesting highlight for me was Mind2Web (a benchmark for web navigation) capabilities of the model, which unlocks agentic behavior for RPA agents.

no need for hefty web automation pipelines that get broken when the website/app design changes! Amazing work.

Lastly, the authors also fine-tune this model on open-set detection for interactable regions and see if they can use it as a plug-in for VLMs and it actually outperforms off-the-shelf open-set detectors like GroundingDINO. ๐Ÿ‘


OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing.