Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
CuiunboΒ 
posted an update Jun 11
Post
2373
Introducing GUICourse! πŸŽ‰
By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent.
πŸ“„ Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))
🌐 Github Repo: (https://github.com/yiye3/GUICourse)
πŸ“– Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)
🎯 Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance)

Any idea when the models might be released?
Also the GUIEnv-local dataset download and viewer has errors.
Any plans to release the GUIEnv-global dataset?
Thanks

Β·

In this week, we'll be releasing our model!
But because GUIEnv-Global is too large, we will only open source our rendering code.