---
tags:
- Text-to-Video
---

![model example](https://i.imgur.com/fosRCN2.png)

# zeroscope_v2 30x448x256

Modelscope without the watermark, in a ratio close to 16:9 with a smoother output.
Trained at 30 frames, 448x256 resolution <br />
Trained with 9923 clips and 29,769 tagged frames


This low-res modelscope model is intended to be upscaled with [potat1](https://huggingface.co/camenduru/potat1) using vid2vid in the 1111 text2video extension by [kabachuha](https://github.com/kabachuha) <br />

[example output](https://i.imgur.com/lj90FYP.mp4) upscaled to 1152 x 640 with potat1


### 1111 text2video extension usage

1. Rename zeroscope_v2_30x448x256.pth to text2video_pytorch_model.pth
2. Rename zeroscope_v2_30x448x256_text.bin to open_clip_pytorch_model.bin
3. Replace files in stable-diffusion-webui\models\ModelScope\t2v


### Upscaling

I recommend upscaling this using vid2vid in the 1111 extension to 1152x640 with a denoise strength between 0.66 and 0.85. Use the same prompt and settings used to create the original clip. <br />


### Known issues

Using a lower resolution or fewer frames will result in a worse output <br />
Many clips come out with cuts. This will be fixed soon with 2.1 with a much cleaner dataset <br />
Some clips come out too slow, and might need prompt engineering to be faster in pace <br />


Thanks to [camenduru](https://github.com/camenduru), [kabachuha](https://github.com/kabachuha), [ExponentialML](https://github.com/ExponentialML), [polyware](https://twitter.com/polyware_ai), [tin2tin](https://github.com/tin2tin)