Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,48 @@
|
|
1 |
---
|
2 |
library_name: paddlenlp
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
library_name: paddlenlp
|
3 |
+
license: apache-2.0
|
4 |
+
tags:
|
5 |
+
- summarization
|
6 |
+
language:
|
7 |
+
- zh
|
8 |
---
|
9 |
+
|
10 |
+
[![paddlenlp-banner](https://user-images.githubusercontent.com/1371212/175816733-8ec25eb0-9af3-4380-9218-27c154518258.png)](https://github.com/PaddlePaddle/PaddleNLP)
|
11 |
+
|
12 |
+
# PaddlePaddle/unimo-text-1.0
|
13 |
+
|
14 |
+
## Introduction
|
15 |
+
|
16 |
+
Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other.
|
17 |
+
They can only utilize single-modal data (i.e. text or image) or limited multi-modal data (i.e. image-text pairs).
|
18 |
+
In this work, we propose a unified-modal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal
|
19 |
+
understanding and generation tasks. Large scale of free text corpus and image collections can be utilized to improve the capability of visual
|
20 |
+
and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified
|
21 |
+
semantic space over a corpus of image-text pairs. As the non-paired single-modal data is very rich, our model can utilize much larger scale of
|
22 |
+
data to learn more generalizable representations. Moreover, the textual knowledge and visual knowledge can enhance each other in the unified semantic space.
|
23 |
+
The experimental results show that UNIMO significantly improves the performance of several single-modal and multi-modal downstream tasks.
|
24 |
+
|
25 |
+
More detail: https://arxiv.org/abs/2012.15409
|
26 |
+
|
27 |
+
## Available Models
|
28 |
+
|
29 |
+
- **unimo-text-1.0**, *12 layer, 12 heads, 768 hidden size, pretrained model*
|
30 |
+
- **unimo-text-1.0-large**, *24 layer, 16 heads, 1024 hidden size, pretrained model*
|
31 |
+
- **unimo-text-1.0-lcsts-new**, *12 layer, 12 heads, 768 hidden size, finetuned on the lcsts-new Chinese summarization dataset*
|
32 |
+
- **unimo-text-1.0-summary**, *12 layer, 12 heads, 768 hidden size, finetuned on several in-house Chinese summarization datasets*
|
33 |
+
|
34 |
+
## How to Use?
|
35 |
+
|
36 |
+
Click on the *Use in paddlenlp* button on the top right!
|
37 |
+
|
38 |
+
## Citation Info
|
39 |
+
|
40 |
+
```text
|
41 |
+
@article{ernie2.0,
|
42 |
+
title = {UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning},
|
43 |
+
author = {Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
|
44 |
+
journal={arXiv preprint arXiv:2012.15409},
|
45 |
+
year = {2020},
|
46 |
+
}
|
47 |
+
```
|
48 |
+
|