Revanthraja commited on
Commit
4d525a1
·
1 Parent(s): 6866e0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -26
README.md CHANGED
@@ -1,35 +1,20 @@
1
  ---
2
  tags:
3
  - Text-to-Video
4
- license: cc-by-nc-4.0
 
5
  ---
 
6
 
7
- ![model example](https://i.imgur.com/fosRCN2.png)
8
 
9
- # zeroscope_v2 30x448x256
10
 
11
- A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output. This model was trained from the [original weights](https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis) using 9,923 clips and 29,769 tagged frames at 30 frames, 448x256 resolution.<br />
12
 
13
- zeroscope_v2 30x448x256 is specifically designed for upscaling with [Potat1](https://huggingface.co/camenduru/potat1) using vid2vid in the [1111 text2video](https://github.com/kabachuha/sd-webui-text2video) extension by [kabachuha](https://github.com/kabachuha). Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in Potat1, permitting faster exploration in 448x256 before transitioning to a high-resolution render. See an [example output](https://i.imgur.com/lj90FYP.mp4) that has been upscaled to 1152 x 640 using Potat1.<br />
14
 
15
- ### Using it with the 1111 text2video extension
16
-
17
- 1. Rename the file 'zeroscope_v2_30x448x256.pth' to 'text2video_pytorch_model.pth'.
18
- 2. Rename the file 'zeroscope_v2_30x448x256_text.bin' to 'open_clip_pytorch_model.bin'.
19
- 3. Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.
20
-
21
-
22
- ### Upscaling recommendations
23
-
24
- For upscaling, it's recommended to use Potat1 via vid2vid in the 1111 extension. Aim for a resolution of 1152x640 and a denoise strength between 0.66 and 0.85. Remember to use the same prompt and settings that were used to generate the original clip.
25
-
26
-
27
- ### Known issues
28
-
29
- Lower resolutions or fewer frames could lead to suboptimal output. <br />
30
- Certain clips might appear with cuts. This will be fixed in the upcoming 2.1 version, which will incorporate a cleaner dataset.
31
- Some clips may playback too slowly, requiring prompt engineering for an increased pace.
32
-
33
-
34
-
35
- Thanks to [camenduru](https://github.com/camenduru), [kabachuha](https://github.com/kabachuha), [ExponentialML](https://github.com/ExponentialML), [polyware](https://twitter.com/polyware_ai), [tin2tin](https://github.com/tin2tin)<br />
 
1
  ---
2
  tags:
3
  - Text-to-Video
4
+ license: cc
5
+ pipeline_tag: text-to-video
6
  ---
7
+ # Text-to-Video Model with Hugging Face Transformers
8
 
9
+ This repository contains a text-to-video generation model fine-tuned using the Hugging Face Transformers library. The model has been trained on various datasets over approximately 1000 steps to generate video content from textual input.
10
 
11
+ ## Overview
12
 
13
+ The text-to-video model developed here is based on Hugging Face's Transformers, specializing in translating textual descriptions into corresponding video sequences. It has been fine-tuned on diverse datasets, enabling it to understand and visualize a wide range of textual prompts, generating relevant video content.
14
 
15
+ ## Features
16
 
17
+ - Transforms text input into corresponding video sequences
18
+ - Fine-tuned using Hugging Face Transformers with datasets spanning various domains
19
+ - Capable of generating diverse video content based on textual descriptions
20
+ - Handles nuanced textual prompts to generate meaningful video representations