iFlyBot commited on
Commit
2ea0cce
·
1 Parent(s): 62bb017

update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,13 +11,13 @@ We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifica
11
 
12
  The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
13
 
14
- Spatial Understanding and Metric: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
15
 
16
- Interactive Target Grounding: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
17
 
18
- Action Abstraction and Control Parameter Generation: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
19
 
20
- Task Planning: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
21
 
22
  We anticipate that IflyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
23
 
 
11
 
12
  The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
13
 
14
+ **Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
15
 
16
+ **Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
17
 
18
+ **Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
19
 
20
+ **Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
21
 
22
  We anticipate that IflyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
23