Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
7
JoshOohAhh
JoshOohAhh
Follow
0 followers
Β·
9 following
AI & ML interests
None yet
Recent Activity
updated
a collection
3 days ago
Ai
liked
a Space
3 days ago
mukaist/Midjourney
Reacted to
singhsidhukuldeep
's
post
with π
27 days ago
Good folks from @Microsoft have released an exciting breakthrough in GUI automation! OmniParser β a game-changing approach for pure vision-based GUI agents that works across multiple platforms and applications. Key technical innovations: - Custom-trained interactable icon detection model using 67k screenshots from popular websites - Specialized BLIP-v2 model fine-tuned on 7k icon-description pairs for extracting functional semantics - Novel combination of icon detection, OCR, and semantic understanding to create structured UI representations The results are impressive: - Outperforms GPT-4V baseline by significant margins on the ScreenSpot benchmark - Achieves 73% accuracy on Mind2Web without requiring HTML data - Demonstrates a 57.7% success rate on AITW mobile tasks What makes OmniParser special is its ability to work across platforms (mobile, desktop, web) using only screenshot data β no HTML or view hierarchy needed. This opens up exciting possibilities for building truly universal GUI automation tools. The team has open-sourced both the interactable region detection dataset and icon description dataset to accelerate research in this space. Kudos to the Microsoft Research team for pushing the boundaries of what's possible with pure vision-based GUI understanding! What are your thoughts on vision-based GUI automation?
View all activity
Organizations
None yet
JoshOohAhh
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a Space
3 days ago
Running
on
Zero
1.72k
ποΈ
MIDJOURNEY
MidJour | A RealVisXL_Turbo | IRL HI-Res Images Gen
liked
a model
27 days ago
monster-labs/control_v1p_sd15_qrcode_monster
Updated
Jul 21, 2023
β’
107k
β’
1.36k
liked
4 Spaces
27 days ago
Running
on
A10G
4.54k
π΅
MusicGen
Running
on
CPU Upgrade
5.43k
π
Kolors Virtual Try-On
Configuration error
5.52k
π₯
DALLΒ·E mini
Running
on
A100
1.04k
π πΌοΈ
LoRA the Explorer
Explore fun LoRAs and generate wi
liked
a Space
30 days ago
Running
11
β‘
Deepfake