Text-to-Image
SEED-Story
English
Andyson commited on
Commit
8b6d2ad
1 Parent(s): 1392233
Files changed (1) hide show
  1. README.md +1 -11
README.md CHANGED
@@ -1,6 +1,5 @@
1
  # SEED-Story
2
  [![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/)
3
- [![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/TencentARC/SEED-Story)
4
  [![Static Badge](https://img.shields.io/badge/Dataset-Huggingface-yellow)](https://huggingface.co/datasets/TencentARC/StoryStream)
5
  [![Static Badge](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/TencentARC/SEED-Story)
6
 
@@ -8,18 +7,9 @@
8
  long stories consists of rich and coherent narrative texts, along with images that are consistent in characters and
9
  style. We also release the StoryStream Dataset for build this model.
10
 
11
- ## Introduction
12
- The introduced SEED-Story, powered by MLLM, is capable of generating multimodal long stories from user-provided images and texts as the beginning of the story. The generated story consists of rich and coherent narrative texts, along with images that are consistent in characters and style. The story can span up to 25 multimodal sequences, even though we only use a maximum of 10 sequences during training.
13
- <img src="assets/teaser.jpg" width="800" alt="Teaser image">
14
-
15
-
16
- Overview of the SEED-Story. Training Pipeline: In Stage 1, we pre-trains an SD-XL-based de-tokenizer to reconstruct images by taking the features of a pre-trained ViT as inputs. In Stage 2, we sample an interleaved image-text sequence of a random length and train the MLLM by performing next-word prediction and image feature regression between the output hidden states of the learnable queries and ViT features of the target image. In Stage 3, the regressed image features from the MLLM are fed into the de-tokenizer for tuning SD-XL, enhancing the consistency of the characters and styles in the generated images.
17
- <img src="assets/pipeline.jpg" width="800" alt="Pipeline image">
18
-
19
-
20
  ## Model Weights
21
  We release the pretrained Tokenizer, the pretrained De-Tokenizer, the pre-trained foundation model **SEED-X-pretrained**,
22
- the StoryStream instruction-tuned MLLM **SEED-Story-George**, and the StoryStream tuned De-Tokenizer in **Detokenizer-George** [SEED-Story](https://huggingface.co/TencentARC/SEED-Story).
23
 
24
  Please download the checkpoints and save them under the folder `./pretrained`.
25
 
 
1
  # SEED-Story
2
  [![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/)
 
3
  [![Static Badge](https://img.shields.io/badge/Dataset-Huggingface-yellow)](https://huggingface.co/datasets/TencentARC/StoryStream)
4
  [![Static Badge](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/TencentARC/SEED-Story)
5
 
 
7
  long stories consists of rich and coherent narrative texts, along with images that are consistent in characters and
8
  style. We also release the StoryStream Dataset for build this model.
9
 
 
 
 
 
 
 
 
 
 
10
  ## Model Weights
11
  We release the pretrained Tokenizer, the pretrained De-Tokenizer, the pre-trained foundation model **SEED-X-pretrained**,
12
+ the StoryStream instruction-tuned MLLM **SEED-Story-George**, and the StoryStream tuned De-Tokenizer in **Detokenizer-George**
13
 
14
  Please download the checkpoints and save them under the folder `./pretrained`.
15