Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
# Adapting Multimodal Large Language Models to Domains via Post-Training
|
2 |
|
3 |
-
This
|
4 |
|
5 |
-
|
6 |
|
7 |
<p align='center'>
|
8 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/iklQIKW_6TyCT13BMq5-d.png" width="600">
|
@@ -10,3 +10,24 @@ Building on our previous work, [AdaptLLM](https://huggingface.co/papers/2309.095
|
|
10 |
|
11 |
******* **Updates** *********
|
12 |
- [2024/11/28] Released our paper.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Adapting Multimodal Large Language Models to Domains via Post-Training
|
2 |
|
3 |
+
This repo provides an implementation preview of our paper, **On Domain-Specific Post-Training for Multimodal Large Language Models**.
|
4 |
|
5 |
+
We investigate domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation. Our resulting model, **AdaMLLM**, consistently outperforms general MLLMs across various tasks in two domains: biomedicine and food.
|
6 |
|
7 |
<p align='center'>
|
8 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/iklQIKW_6TyCT13BMq5-d.png" width="600">
|
|
|
10 |
|
11 |
******* **Updates** *********
|
12 |
- [2024/11/28] Released our paper.
|
13 |
+
|
14 |
+
|
15 |
+
## About
|
16 |
+
|
17 |
+
**AdaMLLM** represents our third effort to enhance **task generalization** of trained models by scaling synthetic supervised tasks from unsupervised contexts.
|
18 |
+
|
19 |
+
- **1st Work: [AdaptLLM](https://huggingface.co/papers/2309.09530) (ICLR 2024)**
|
20 |
+
We employ rule-based methods to extract tasks from domain-specific corpora, reformatting them into reading comprehension tasks for continued pre-training. Our 7B model outperforms domain-spcific models of much larger scales such as BloombergGPT.
|
21 |
+
|
22 |
+
- **2nd Work: [Instruction Pretraining](https://huggingface.co/instruction-pretrain) (EMNLP 2024)**
|
23 |
+
We develop a general-purpose instruction synthesizer that significantly increased task diversity. Instruction Pretraining outperforms Vanilla Pretraining in both general pretraining from scratch and domain-adaptive continual pretraining.
|
24 |
+
|
25 |
+
- **3rd Work: AdaMLLM (This Work)**
|
26 |
+
We extend supervised task synthesis to multimodality, introducing a unified **visual instruction synthesizer** to extract task pairs from image-caption pairs. Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing domain-specific performance for MLLMs.
|
27 |
+
|
28 |
+
Looking ahead, we envision broadening the scope of supervised task synthesis, enhancing general capabilities of trained models.
|
29 |
+
|
30 |
+
<p align='center'>
|
31 |
+
<img src=https://cdn-uploads.huggingface.co/production/uploads/650801ced5578ef7e20b33d4/-5qzvcSj_PCYKmTS_ZMOS.png width="1000">
|
32 |
+
</p>
|
33 |
+
|