# Vlogger [![arXiv](https://img.shields.io/badge/arXiv-2310.20700-b31b1b.svg)](https://arxiv.org/abs/2310.20700) [![Project Page](https://img.shields.io/badge/Vlogger-Website-green)](https://zhuangshaobin.github.io/Vlogger.github.io/) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FSEINE&count_bg=%23F59352&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false)](https://hits.seeyoufarm.com) This repository is the official implementation of [Vlogger](https://arxiv.org/abs/2310.20700): **[Vlogger: Make Your Dream A Vlog](https://arxiv.org/abs/2310.20700)** Demo generated by our Vlogger: [Teddy Travel](https://youtu.be/ZRD1-jHbEGk) Below is the compressed version of [Teddy Travel](https://youtu.be/ZRD1-jHbEGk). https://github.com/zhuangshaobin/Vlogger/assets/94739615/1e8dd246-d3b9-49e9-8eee-d40b6d8523b9 ## Setup ### Prepare Environment ``` conda create -n vlogger python==3.10.11 conda activate vlogger pip install -r requirements.txt ``` ### Download our model and T2I base model Our model is based on Stable diffusion v1.4, you may download [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and [OpenCLIP-ViT-H-14](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) to the director of ``` pretrained ``` . Download our model(ShowMaker) checkpoint (from [google drive](https://drive.google.com/file/d/1pAH73kz2QRfD2Dxk4lL3SrHvLAlWcPI3/view?usp=drive_link) or [hugging face](https://huggingface.co/GrayShine/Vlogger/tree/main)) and save to the directory of ```pretrained``` Now under `./pretrained`, you should be able to see the following: ``` ├── pretrained │ ├── ShowMaker.pt │ ├── stable-diffusion-v1-4 │ ├── OpenCLIP-ViT-H-14 │ │ ├── ... └── └── ├── ... ├── ... ``` ## Usage ### Inference for (T+I)2V Run the following command to get the (T+I)2V results: ```python python sample_scripts/with_mask_sample.py ``` The generated video will be saved in ```results/mask_no_ref```. ### Inference for (T+I+ref)2V Run the following command to get the (T+I+ref)2V results: ```python python sample_scripts/with_mask_ref_sample.py ``` The generated video will be saved in ```results/mask_ref```. ### Inference for LLM planning and make reference image Run the following command to get script, actors and protagonist: ```python python sample_scripts/vlog_write_script.py ``` The generated scripts will be saved in ```results/vlog/$your_story_dir/script```. The generated reference images will be saved in ```results/vlog/$your_story_dir/img```. !!!important: Enter your openai key in the 7th line of the file ```vlogger/planning_utils/gpt4_utils.py``` ### Inference for vlog generation Run the following command to get the vlog: ```python python sample_scripts/vlog_read_script_sample.py ``` The generated scripts will be saved in ```results/vlog/$your_story_dir/video```. #### More Details You may modify ```configs/with_mask_sample.yaml``` to change the (T+I)2V conditions. You may modify ```configs/with_mask_ref_sample.yaml``` to change the (T+I+ref)2V conditions. For example: ```ckpt``` is used to specify a model checkpoint. ```text_prompt``` is used to describe the content of the video. ```input_path``` is used to specify the path to the image. ```ref_path``` is used to specify the path to the reference image. ```save_path``` is used to specify the path to the generated video. ## Results ### (T+I)2V Results
Input Image | Output Video |
Underwater environment cosmetic bottles.
|
|
A big drop of water falls on a rose petal.
|
|
A fish swims past an oriental woman.
|
|
Cinematic photograph. View of piloting aaero.
|
|
Planet hits earth.
|
Output Video | |
A deer looks at the sunset behind him.
|
A duck is teaching math to another duck.
|
Bezos explores tropical rainforest.
|
A deer looks at the sunset behind him.
|