|
--- |
|
license: llama2 |
|
--- |
|
# SEED Multimodal |
|
|
|
[Project Homepage](https://ailab-cvc.github.io/seed/) |
|
|
|
**Powered by [CV Center, Tencent AI Lab](https://ailab-cvc.github.io), and [ARC Lab, Tencent PCG](https://github.com/TencentARC).** |
|
|
|
## Usage |
|
|
|
### Dependencies |
|
- Python >= 3.8 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux)) |
|
- [PyTorch >= 1.11.0](https://pytorch.org/) |
|
- NVIDIA GPU + [CUDA](https://developer.nvidia.com/cuda-downloads) |
|
|
|
### Installation |
|
1. Clone repo |
|
|
|
```bash |
|
git clone https://github.com/AILab-CVC/SEED.git |
|
cd SEED |
|
``` |
|
|
|
2. Install dependent packages |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Model Weights |
|
We provide the pretrained SEED Tokenizer and De-Tokenizer, instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B. |
|
Please download the checkpoints and save under the folder `./pretrained`. |
|
|
|
To reconstruct the image from the SEED visual codes using unCLIP SD-UNet, please download the pretrained [unCLIP SD](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip). |
|
Rename the checkpoint directory to **"diffusion_model"** and create a soft link to the "pretrained/seed_tokenizer" directory. |
|
|
|
|
|
### Inference for visual tokenization and de-tokenization |
|
To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet: |
|
```bash |
|
python scripts/seed_tokenizer_inference.py |
|
``` |
|
|
|
### Launching Demo of SEED-LLaMA Locally |
|
```bash |
|
sh start_backend.sh |
|
sh start_frontend.sh |
|
``` |
|
|
|
## Citation |
|
If you find the work helpful, please consider citing: |
|
```bash |
|
@article{ge2023making, |
|
title={Making LLaMA SEE and Draw with SEED Tokenizer}, |
|
author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying}, |
|
journal={arXiv preprint arXiv:2310.01218}, |
|
year={2023} |
|
} |
|
|
|
@article{ge2023planting, |
|
title={Planting a seed of vision in large language model}, |
|
author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying}, |
|
journal={arXiv preprint arXiv:2307.08041}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
The project is still in progress. Stay tuned for more updates! |
|
|
|
## License |
|
`SEED` is released under [Apache License Version 2.0](License.txt). |
|
|
|
`SEED-LLaMA` is released under the original [License](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) of [LLaMA2](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf). |
|
|
|
## Acknowledgement |
|
We thank the great work from [unCLIP SD](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip) and [BLIP2](https://github.com/salesforce/LAVIS). |
|
|
|
|