xurantju commited on
Commit
d2fc333
1 Parent(s): 40af279

update latest MM1 results, model path fix

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
9
  # Model description
10
  We are excited to announce the continuation and rebranding of our **BLIP series** into **XGen-MM**, aligning with Salesforce's unified XGen initiative for large foundation models! This rebranding marks a significant step in our ongoing development of cutting-edge multimodal technologies.
11
 
12
- 'XGen-mm' is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. \
13
  These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
14
 
15
  * The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
@@ -43,11 +43,11 @@ More technical details will come with a technical report soon.
43
  ### Instruct (after instruction tuning)
44
  | Model | SEED-IMG | MMBench(dev) | MME-total | MME-P | MME-C | MMStar | MMMU (val) | MMVet | MathVista (mini) | ScienceQA (test) | POPE | AI2D | |
45
  |----------------------------|----------|--------------|-----------|----------|---------|----------|------------|----------|------------------|------------------|----------|----------|---|
46
- | MM1-3B-Chat | 68.8 | **75.9** | 1761 | **1482** | 279 | - | 33.9 | 43.7 | - | - | **87.4** | - | |
47
  | openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
48
  | VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
49
  | xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
50
- | **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | 74.1 | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
51
 
52
 
53
  # How to use
@@ -77,7 +77,7 @@ class EosListStoppingCriteria(StoppingCriteria):
77
  return self.eos_sequence in last_ids
78
 
79
  # load models
80
- model_name_or_path = "Salesforce/blip3-phi3-mini-instruct-r-v1"
81
  model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)
82
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, use_fast=False, legacy=False)
83
  image_processor = AutoImageProcessor.from_pretrained(model_name_or_path, trust_remote_code=True)
 
9
  # Model description
10
  We are excited to announce the continuation and rebranding of our **BLIP series** into **XGen-MM**, aligning with Salesforce's unified XGen initiative for large foundation models! This rebranding marks a significant step in our ongoing development of cutting-edge multimodal technologies.
11
 
12
+ `XGen-MM` is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the `BLIP` series, incorporating fundamental enhancements that ensure a more robust and superior foundation. \
13
  These models have been trained at scale on high-quality image caption datasets and interleaved image-text data. XGen-MM highlights a few features below,
14
 
15
  * The **pretrained** foundation model, `xgen-mm-phi3-mini-base-r-v1`, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities.
 
43
  ### Instruct (after instruction tuning)
44
  | Model | SEED-IMG | MMBench(dev) | MME-total | MME-P | MME-C | MMStar | MMMU (val) | MMVet | MathVista (mini) | ScienceQA (test) | POPE | AI2D | |
45
  |----------------------------|----------|--------------|-----------|----------|---------|----------|------------|----------|------------------|------------------|----------|----------|---|
46
+ | MM1-3B-Chat | 68.8 | 67.8 | 1761 | **1482** | 279 | - | 33.9 | 43.7 | - | - | **87.4** | - | |
47
  | openbmb/MiniCPM-V-2 | 67.1 | 69.6 | 1808 | - | - | - | 38.2 | - | 38.7 | - | - | - | |
48
  | VILA1.5-3B | 67.9 | 63.4 | - | 1442 | - | - | 33.3 | 35.4 | - | 69.0 | 85.9 | - | |
49
  | xtuner/llava-phi-3-mini-hf | 70.0 | 69.2 | 1790 | 1477 | 313 | 43.7 | **41.4** | - | - | 73.7 | 87.3 | 69.3 | |
50
+ | **xgen-mm-phi3-mini-instruct-r-v1 (Ours)** | **72.1** | **74.1** | **1827** | 1467 | **360** | **44.6** | 39.8 | **45.1** | **39.3** | **74.2** | 87.2 | **75.8** | |
51
 
52
 
53
  # How to use
 
77
  return self.eos_sequence in last_ids
78
 
79
  # load models
80
+ model_name_or_path = "Salesforce/xgen-mm-phi3-mini-instruct-r-v1"
81
  model = AutoModelForVision2Seq.from_pretrained(model_name_or_path, trust_remote_code=True)
82
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, use_fast=False, legacy=False)
83
  image_processor = AutoImageProcessor.from_pretrained(model_name_or_path, trust_remote_code=True)