update README.md
Browse files
README.md
CHANGED
|
@@ -11,13 +11,13 @@ We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifica
|
|
| 11 |
|
| 12 |
The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
|
| 13 |
|
| 14 |
-
Spatial Understanding and Metric
|
| 15 |
|
| 16 |
-
Interactive Target Grounding
|
| 17 |
|
| 18 |
-
Action Abstraction and Control Parameter Generation
|
| 19 |
|
| 20 |
-
Task Planning
|
| 21 |
|
| 22 |
We anticipate that IflyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
|
| 23 |
|
|
|
|
| 11 |
|
| 12 |
The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
|
| 13 |
|
| 14 |
+
**Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
|
| 15 |
|
| 16 |
+
**Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
|
| 17 |
|
| 18 |
+
**Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
|
| 19 |
|
| 20 |
+
**Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
|
| 21 |
|
| 22 |
We anticipate that IflyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
|
| 23 |
|