init
Browse files
README.md
CHANGED
@@ -1,3 +1,78 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
# BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
|
6 |
+
|
7 |
+
<div align="center" style="line-height: 1;">
|
8 |
+
<a href="https://github.com/HorizonRobotics/BIP3D" target="_blank" style="margin: 2px;">
|
9 |
+
<img alt="Code" src="https://img.shields.io/badge/Code-Github-bule" style="display: inline-block; vertical-align: middle;"/>
|
10 |
+
</a>
|
11 |
+
<a href="https://linxuewu.github.io/BIP3D-page/" target="_blank" style="margin: 2px;">
|
12 |
+
<img alt="Homepage" src="https://img.shields.io/badge/Homepage-BIP3D-green" style="display: inline-block; vertical-align: middle;"/>
|
13 |
+
</a>
|
14 |
+
<a href="https://huggingface.co/xuewulin/BIP3D" target="_blank" style="margin: 2px;">
|
15 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/Models-Hugging%20Face-yellow" style="display: inline-block; vertical-align: middle;"/>
|
16 |
+
</a>
|
17 |
+
<a href="https://arxiv.org/abs/2411.14869" target="_blank" style="margin: 2px;">
|
18 |
+
<img alt="Paper" src="https://img.shields.io/badge/Paper-Arxiv-red" style="display: inline-block; vertical-align: middle;"/>
|
19 |
+
</a>
|
20 |
+
</div>
|
21 |
+
|
22 |
+
|
23 |
+
<div align="center">
|
24 |
+
<img src="https://github.com/HorizonRobotics/BIP3D/raw/main/resources/bip3d_structure.png" width="90%" alt="BIP3D" />
|
25 |
+
<p style="font-size:0.8em; color:#555;">The Architecture Diagram of BIP3D, where the red stars indicate the parts that have been modified or added compared to the base model, GroundingDINO, and dashed lines indicate optional elements.</p>
|
26 |
+
</div>
|
27 |
+
|
28 |
+
## Results on EmbodiedScan Benchmark
|
29 |
+
We made several improvements based on the original paper, achieving better 3D perception results. The main improvements include the following two points:
|
30 |
+
1. **New Fusion Operation**: We enhanced the decoder by replacing the deformable aggregation (DAG) with a 3D deformable attention mechanism (DAT). Specifically, we improved the feature sampling process by transitioning from bilinear interpolation to trilinear interpolation, which leverages depth distribution for more accurate feature extraction.
|
31 |
+
2. **Mixed Data Training**: To optimize the grounding model's performance, we adopted a mixed-data training strategy by integrating detection data with grounding data during the grounding finetuning process.
|
32 |
+
|
33 |
+
### 1. Results on Multi-view 3D Detection Validation Dataset
|
34 |
+
|
35 |
+
|Model | Inputs | Op | Overall | Head | Common | Tail | Small | Medium | Large | ScanNet | 3RScan | MP3D | ckpt | log |
|
36 |
+
| :----: | :---: | :---: | :---: |:---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: | :---: |
|
37 |
+
|BIP3D | RGB | DAG | 16.57|23.29|13.84|12.29|2.67|17.85|12.89|19.71|26.76|8.50 | - | - |
|
38 |
+
|BIP3D | RGB | DAT | 16.67|22.41|14.19|13.18|3.32|17.25|14.89|20.80|24.18|9.91 | - | - |
|
39 |
+
|BIP3D |RGB-D | DAG | 22.53|28.89|20.51|17.83|6.95|24.21|15.46|24.77|35.29|10.34 | - | - |
|
40 |
+
|BIP3D |RGB-D | DAT | 23.24|31.51|20.20|17.62|7.31|24.09|15.82|26.35|36.29|11.44 | - | - |
|
41 |
+
|
42 |
+
### 2. Results on Multi-view 3D Grounding Mini Dataset
|
43 |
+
|Model | Inputs | Op | Overall | Easy | Hard | View-dep | View-indep | ScanNet | 3RScan | MP3D | ckpt | log |
|
44 |
+
| :----: | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: |:---: | :----: |
|
45 |
+
|BIP3D | RGB | DAG | 44.00|44.39|39.56|46.05|42.92|48.62|42.47|36.40 | - | - |
|
46 |
+
|BIP3D | RGB | DAT | 44.43|44.74|41.02|45.17|44.04|49.70|41.81|37.28 | - | - |
|
47 |
+
|BIP3D | RGB-D | DAG | 45.79|46.22|40.91|45.93|45.71|48.94|46.61|37.36 | - | - |
|
48 |
+
|BIP3D | RGB-D | DAT | 58.47|59.02|52.23|60.20|57.56|66.63|54.79|46.72 | - | - |
|
49 |
+
|
50 |
+
|
51 |
+
### 3. Results on Multi-view 3D Grounding Validation Dataset
|
52 |
+
|Model | Inputs | Op | Mixed Data | Overall | Easy | Hard | View-dep | View-indep | ScanNet | 3RScan | MP3D | ckpt | log |
|
53 |
+
| :----: | :---: | :---: | :---: |:---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: |:---: | :----: |
|
54 |
+
|BIP3D | RGB | DAG |No| 45.81|46.21|41.34|47.07|45.09|50.40|47.53|32.97 | - | - |
|
55 |
+
|BIP3D | RGB | DAT |No| 47.29|47.82|41.42|48.58|46.56|52.74|47.85|34.60 | - | - |
|
56 |
+
|BIP3D | RGB-D | DAG |No| 53.75|53.87|52.43|55.21|52.93|60.05|54.92|38.20 | - | - |
|
57 |
+
|BIP3D | RGB-D | DAT |No|61.36|61.88|55.58|62.43|60.76|66.96|62.75|46.92 | - | - |
|
58 |
+
|BIP3D | RGB-D | DAT |Yes|66.58|66.99|62.07|67.95|65.81|72.43|68.26|51.14 | - | - |
|
59 |
+
|
60 |
+
|
61 |
+
### 4. [Results on Multi-view 3D Grounding Test Dataset](https://huggingface.co/spaces/AGC2024/visual-grounding-2024)
|
62 |
+
|Model | Overall | Easy | Hard | View-dep | View-indep | ckpt | log |
|
63 |
+
| :----: | :---: | :---: | :---: | :---: | :---:| :---:|:---:|
|
64 |
+
|[EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan) | 39.67 | 40.52 | 30.24 | 39.05 | 39.94 | - | - |
|
65 |
+
|[SAG3D*](https://opendrivelab.github.io/Challenge%202024/multiview_Mi-Robot.pdf) | 46.92 | 47.72 | 38.03 | 46.31 | 47.18 | - | - |
|
66 |
+
|[DenseG*](https://opendrivelab.github.io/Challenge%202024/multiview_THU-LenovoAI.pdf) | 59.59 | 60.39 | 50.81 | 60.50 | 59.20 | - | - |
|
67 |
+
|BIP3D | 67.38 | 68.12 | 59.08 | 67.88 | 67.16 | - | - |
|
68 |
+
|BIP3D-Base | 70.53 | 71.22 | 62.91 | 70.69 | 70.47 | - | - |
|
69 |
+
|
70 |
+
## Citation
|
71 |
+
```
|
72 |
+
@article{lin2024bip3d,
|
73 |
+
title={BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence},
|
74 |
+
author={Lin, Xuewu and Lin, Tianwei and Huang, Lichao and Xie, Hongyu and Su, Zhizhong},
|
75 |
+
journal={arXiv preprint arXiv:2411.14869},
|
76 |
+
year={2024}
|
77 |
+
}
|
78 |
+
```
|