Post
2024
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. ๐ป
Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!
Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
import time
from askui import VisionAgent
with VisionAgent() as agent:
agent.tools.webbrowser.open_new("http://www.google.com")
time.sleep(0.5)
agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
agent.type("cats")
agent.keyboard("enter")
time.sleep(0.5)
agent.click("text 'Images'", model_name="AskUI/PTA-1")
time.sleep(0.5)
agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")
Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!
Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B