readme
Browse files
README.md
CHANGED
@@ -1,6 +1,5 @@
|
|
1 |
# SEED-Story
|
2 |
[![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/)
|
3 |
-
[![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/TencentARC/SEED-Story)
|
4 |
[![Static Badge](https://img.shields.io/badge/Dataset-Huggingface-yellow)](https://huggingface.co/datasets/TencentARC/StoryStream)
|
5 |
[![Static Badge](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/TencentARC/SEED-Story)
|
6 |
|
@@ -8,18 +7,9 @@
|
|
8 |
long stories consists of rich and coherent narrative texts, along with images that are consistent in characters and
|
9 |
style. We also release the StoryStream Dataset for build this model.
|
10 |
|
11 |
-
## Introduction
|
12 |
-
The introduced SEED-Story, powered by MLLM, is capable of generating multimodal long stories from user-provided images and texts as the beginning of the story. The generated story consists of rich and coherent narrative texts, along with images that are consistent in characters and style. The story can span up to 25 multimodal sequences, even though we only use a maximum of 10 sequences during training.
|
13 |
-
<img src="assets/teaser.jpg" width="800" alt="Teaser image">
|
14 |
-
|
15 |
-
|
16 |
-
Overview of the SEED-Story. Training Pipeline: In Stage 1, we pre-trains an SD-XL-based de-tokenizer to reconstruct images by taking the features of a pre-trained ViT as inputs. In Stage 2, we sample an interleaved image-text sequence of a random length and train the MLLM by performing next-word prediction and image feature regression between the output hidden states of the learnable queries and ViT features of the target image. In Stage 3, the regressed image features from the MLLM are fed into the de-tokenizer for tuning SD-XL, enhancing the consistency of the characters and styles in the generated images.
|
17 |
-
<img src="assets/pipeline.jpg" width="800" alt="Pipeline image">
|
18 |
-
|
19 |
-
|
20 |
## Model Weights
|
21 |
We release the pretrained Tokenizer, the pretrained De-Tokenizer, the pre-trained foundation model **SEED-X-pretrained**,
|
22 |
-
the StoryStream instruction-tuned MLLM **SEED-Story-George**, and the StoryStream tuned De-Tokenizer in **Detokenizer-George**
|
23 |
|
24 |
Please download the checkpoints and save them under the folder `./pretrained`.
|
25 |
|
|
|
1 |
# SEED-Story
|
2 |
[![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/)
|
|
|
3 |
[![Static Badge](https://img.shields.io/badge/Dataset-Huggingface-yellow)](https://huggingface.co/datasets/TencentARC/StoryStream)
|
4 |
[![Static Badge](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/TencentARC/SEED-Story)
|
5 |
|
|
|
7 |
long stories consists of rich and coherent narrative texts, along with images that are consistent in characters and
|
8 |
style. We also release the StoryStream Dataset for build this model.
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
## Model Weights
|
11 |
We release the pretrained Tokenizer, the pretrained De-Tokenizer, the pre-trained foundation model **SEED-X-pretrained**,
|
12 |
+
the StoryStream instruction-tuned MLLM **SEED-Story-George**, and the StoryStream tuned De-Tokenizer in **Detokenizer-George**
|
13 |
|
14 |
Please download the checkpoints and save them under the folder `./pretrained`.
|
15 |
|