See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao1* , Wenhao Chai*2❤, Xuan Wang1*, Li Boyi1, Shengyu Hao1, Shidong Cao1, Tian Ye3, Jenq-Neng Hwang2, Gaoang Wang1✉ 1 Zhejiang University 2 University of Washington 3 Hong Kong University of Science and Technology (GZ) *Equal contribution ❤Project lead ✉Corresponding author
STEVE, named after the protagonist of the game Minecraft, is our proposed framework aims to build an embodied agent based on the vision model and LLMs within an open world.