xurantju commited on
Commit
3d62ff8
1 Parent(s): 79eb3e7

update model card

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -8,11 +8,11 @@ pipeline_tag: image-text-to-text
8
 
9
  # Model description
10
 
11
- `BLIP3` is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. \
12
- These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. BLIP3 highlights a few features below,
13
 
14
- * The **pretrained** foundation model, `blip3-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
15
- * The **instruct** fine-tuned model, `blip3-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
16
  * `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
17
 
18
  More technical details will come with a technical report soon.
@@ -35,7 +35,7 @@ More technical details will come with a technical report soon.
35
  | MM1-3B | 0 | 73.5 | 55.6 | 63.3 | 26.1 | 29.4 | 15.6 | 46.2 |
36
  | | 4 | 112.3 | 99.7 | 84.1 | 48.6 | 45.3 | 38.0 | 57.9 |
37
  | | 8 | 114.6 | 104.7 | 88.8 | 48.4 | 44.6 | 46.4 | 63.6 |
38
- | **blip3-phi3-mini-base-r-v1 (Ours)**| 0 | **81.7** | **80.2** | 60.7 | **26.5** | **36.0** | **21.2** | **48.1** |
39
  | | 4 | 110.5 | **101.7** | **84.6** | **49.2** | **46.1** | **38.4** | **63.9** |
40
  | | 8 | 112.1 | 104.4 | 87.7 | **49.1** | **46.4** | 44.3 | **63.8** |
41
 
@@ -46,7 +46,7 @@ More technical details will come with a technical report soon.
46
  | openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
47
  | VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
48
  | xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
49
- | **blip3-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | 74.1 | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
50
 
51
 
52
  # How to use
@@ -130,9 +130,9 @@ Our code and weights are released under the Creative Commons Attribution Non Com
130
 
131
  # Citation
132
  ```
133
- @misc{blip3_phi3_mini,
134
- title={BLIP3-phi3-mini-instruct Model Card},
135
- url={https://huggingface.co/Salesforce/blip3-phi3-mini-instruct-r-v1},
136
  author={Salesforce AI Research},
137
  month={May},
138
  year={2024}
 
8
 
9
  # Model description
10
 
11
+ `XGen-MM` is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. \
12
+ These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
13
 
14
+ * The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
15
+ * The **instruct** fine-tuned model, `xgen-mm-phi3-mini-instruct-r-v1`, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters.
16
  * `blip3-phi3-mini-instruct-r-v1` supports flexible high-resolution image encoding with efficient visual token sampling.
17
 
18
  More technical details will come with a technical report soon.
 
35
  | MM1-3B | 0 | 73.5 | 55.6 | 63.3 | 26.1 | 29.4 | 15.6 | 46.2 |
36
  | | 4 | 112.3 | 99.7 | 84.1 | 48.6 | 45.3 | 38.0 | 57.9 |
37
  | | 8 | 114.6 | 104.7 | 88.8 | 48.4 | 44.6 | 46.4 | 63.6 |
38
+ | **xgen-mm-phi3-mini-base-r-v1 (Ours)**| 0 | **81.7** | **80.2** | 60.7 | **26.5** | **36.0** | **21.2** | **48.1** |
39
  | | 4 | 110.5 | **101.7** | **84.6** | **49.2** | **46.1** | **38.4** | **63.9** |
40
  | | 8 | 112.1 | 104.4 | 87.7 | **49.1** | **46.4** | 44.3 | **63.8** |
41
 
 
46
  | openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
47
  | VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
48
  | xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
49
+ | **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | 74.1 | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
50
 
51
 
52
  # How to use
 
130
 
131
  # Citation
132
  ```
133
+ @misc{xgen_mm_phi3_mini,
134
+ title={xgen-mm-phi3-mini-instruct Model Card},
135
+ url={https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-r-v1},
136
  author={Salesforce AI Research},
137
  month={May},
138
  year={2024}