---
tags:
- Text-to-Video
---
![model example](https://i.imgur.com/fosRCN2.png)
# zeroscope_v2 30x448x256
Modelscope without the watermark, in a ratio close to 16:9 with a smoother output.
Trained at 30 frames, 448x256 resolution
Trained with 9923 clips and 29,769 tagged frames
This low-res modelscope model is intended to be upscaled with [potat1](https://huggingface.co/camenduru/potat1) using vid2vid in the 1111 text2video extension by [kabachuha](https://github.com/kabachuha)
[example output](https://i.imgur.com/lj90FYP.mp4) upscaled to 1152 x 640 with potat1
### 1111 text2video extension usage
1. Rename zeroscope_v2_30x448x256.pth to text2video_pytorch_model.pth
2. Rename zeroscope_v2_30x448x256_text.bin to open_clip_pytorch_model.bin
3. Replace files in stable-diffusion-webui\models\ModelScope\t2v
### Upscaling
I recommend upscaling this using vid2vid in the 1111 extension to 1152x640 with a denoise strength between 0.66 and 0.85. Use the same prompt and settings used to create the original clip.
### Known issues
Using a lower resolution or fewer frames will result in a worse output
Many clips come out with cuts. This will be fixed soon with 2.1 with a much cleaner dataset
Some clips come out too slow, and might need prompt engineering to be faster in pace
Thanks to [camenduru](https://github.com/camenduru), [kabachuha](https://github.com/kabachuha), [ExponentialML](https://github.com/ExponentialML), [polyware](https://twitter.com/polyware_ai), [tin2tin](https://github.com/tin2tin)