blankyang233 commited on
Commit
b6fda71
·
verified ·
1 Parent(s): f218f36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # BLM<sub>0</sub>: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
2
 
3
 
4
 
@@ -9,18 +9,14 @@
9
 
10
 
11
  ## 🔥 Overview
12
- Multimodal large language models (MLLMs) have demonstrated strong vision-language reasoning and increasingly underpin embodied agents. However, unified models that simultaneously support tasks in digital and physical spaces and generalize across embodiments remain scarce. To address this gap, we propose <b>Boundless Large Model (BLM<sub>0</sub>)</b>, a multimodal spatial foundation model that preserves native instruction-following and reasoning while injecting embodied knowledge and enabling robust cross-embodiment control. BLM<sub>0</sub> unifies three core capabilities: cross-space transfer, cross-task learning, and cross-embodiment generalization, which are realized through a two-stage training recipe. Stage I uses curated digital corpora to impart embodied knowledge to the MLLM while preserving language abilities. Stage II trains a policy module via an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, avoiding MLLM fine-tuning. It uses a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six increasingly challenging tasks. We evaluate BLM<sub>0</sub> as a single model on both digital and physical benchmarks and compare it against four families: Multimodal Large Language Models, Embodied Large Language Models, Vision-Language-Action models, and General Multimodal Large Models. BLM<sub>0</sub> improves digital-space tasks by approximately <b>6%</b> and physical-space tasks by approximately <b>3%</b>.
13
 
14
 
15
  ## 🚀 Features
16
  - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
17
  - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
18
  - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
19
- - BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.
20
-
21
-
22
- ## 🗞️ News
23
- - **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
24
 
25
 
26
  ## 🛠️ Setup
@@ -46,7 +42,7 @@ vllm serve ./model \
46
  --trust-remote-code \
47
  --dtype bfloat16 \
48
  --max-model-len 128000 \
49
- --served-model-name BLM-0
50
  ```
51
 
52
  Run python script as example:
@@ -71,7 +67,7 @@ with open(image, "rb") as f:
71
  base64_img = f"data:image;base64,{encoded_image}"
72
 
73
  response = client.chat.completions.create(
74
- model="BLM-0",
75
  messages=[
76
  {
77
  "role": "system",
@@ -112,8 +108,8 @@ print(response.choices[0].message.content)
112
  If you find this project useful, please consider citing our paper.
113
  ```bib
114
  @article{
115
- BLM-0,
116
- title={BLM$_0$: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning},
117
  author={WenTao Tan, Bowen Wang, Heng Zhi, Chenyu Liu, Zhe Li, Jian Liu, Zenrong Lin, Yukun Dai, Yipeng Chen, Wenjie Yang, Enci Xie, Hao Xue, Baixu Ji, Chen Xu, Zhibin Wang, Tianshi Wang, Lei Zhu, Hengtao Shen},
118
  journal={},
119
  year={2025}
 
1
+ # BLM<sub>1</sub>: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
2
 
3
 
4
 
 
9
 
10
 
11
  ## 🔥 Overview
12
+ Multimodal large language models (MLLMs) have advanced visionlanguage reasoning and are increasingly deployed in embodied agents. However, significant limitations remain: MLLMs generalize poorly across digital–physical spaces and embodiments; vision–language–action models (VLAs) produce low-level actions yet lack robust high-level embodied reasoning; and most embodied large language models (ELLMs) are constrained to digital-space with poor generalization to physical world. Thus, unified models that operate seamlessly across digital and physical spaces while generalizing across embodiments and tasks remain absent. We introduce the <b>Boundless Large Model (BLM<sub>1</sub>)</b>, a multimodal spatial foundation model that preserves instruction following and reasoning, incorporates embodied knowledge, and supports robust cross-embodiment control. BLM<sub>1</sub> integrates three key capabilities—<i>cross-space transfer, cross-task learning, and cross-embodiment generalization</i>—via a two-stage training paradigm. Stage I injects embodied knowledge into the MLLM through curated digital corpora while maintaining language competence. Stage II trains a policy module through an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, without fine-tuning the MLLM backbone. This process is supported by a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six progressively challenging tasks. Evaluations across digital and physical benchmarks show that a single BLM<sub>1</sub> instance outperforms four model families—MLLMs, ELLMs, VLAs, and GMLMs—achieving <b>&sim;6%</b> gains in digital tasks and <b>&sim;3%</b> in physical tasks.
13
 
14
 
15
  ## 🚀 Features
16
  - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
17
  - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
18
  - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
19
+ - BLM-1 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.
 
 
 
 
20
 
21
 
22
  ## 🛠️ Setup
 
42
  --trust-remote-code \
43
  --dtype bfloat16 \
44
  --max-model-len 128000 \
45
+ --served-model-name BLM
46
  ```
47
 
48
  Run python script as example:
 
67
  base64_img = f"data:image;base64,{encoded_image}"
68
 
69
  response = client.chat.completions.create(
70
+ model="BLM",
71
  messages=[
72
  {
73
  "role": "system",
 
108
  If you find this project useful, please consider citing our paper.
109
  ```bib
110
  @article{
111
+ BLM-1,
112
+ title={BLM$_1$: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning},
113
  author={WenTao Tan, Bowen Wang, Heng Zhi, Chenyu Liu, Zhe Li, Jian Liu, Zenrong Lin, Yukun Dai, Yipeng Chen, Wenjie Yang, Enci Xie, Hao Xue, Baixu Ji, Chen Xu, Zhibin Wang, Tianshi Wang, Lei Zhu, Hengtao Shen},
114
  journal={},
115
  year={2025}