Revanthraja
commited on
Commit
·
4d525a1
1
Parent(s):
6866e0b
Update README.md
Browse files
README.md
CHANGED
@@ -1,35 +1,20 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
- Text-to-Video
|
4 |
-
license: cc
|
|
|
5 |
---
|
|
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
3. Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.
|
20 |
-
|
21 |
-
|
22 |
-
### Upscaling recommendations
|
23 |
-
|
24 |
-
For upscaling, it's recommended to use Potat1 via vid2vid in the 1111 extension. Aim for a resolution of 1152x640 and a denoise strength between 0.66 and 0.85. Remember to use the same prompt and settings that were used to generate the original clip.
|
25 |
-
|
26 |
-
|
27 |
-
### Known issues
|
28 |
-
|
29 |
-
Lower resolutions or fewer frames could lead to suboptimal output. <br />
|
30 |
-
Certain clips might appear with cuts. This will be fixed in the upcoming 2.1 version, which will incorporate a cleaner dataset.
|
31 |
-
Some clips may playback too slowly, requiring prompt engineering for an increased pace.
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
Thanks to [camenduru](https://github.com/camenduru), [kabachuha](https://github.com/kabachuha), [ExponentialML](https://github.com/ExponentialML), [polyware](https://twitter.com/polyware_ai), [tin2tin](https://github.com/tin2tin)<br />
|
|
|
1 |
---
|
2 |
tags:
|
3 |
- Text-to-Video
|
4 |
+
license: cc
|
5 |
+
pipeline_tag: text-to-video
|
6 |
---
|
7 |
+
# Text-to-Video Model with Hugging Face Transformers
|
8 |
|
9 |
+
This repository contains a text-to-video generation model fine-tuned using the Hugging Face Transformers library. The model has been trained on various datasets over approximately 1000 steps to generate video content from textual input.
|
10 |
|
11 |
+
## Overview
|
12 |
|
13 |
+
The text-to-video model developed here is based on Hugging Face's Transformers, specializing in translating textual descriptions into corresponding video sequences. It has been fine-tuned on diverse datasets, enabling it to understand and visualize a wide range of textual prompts, generating relevant video content.
|
14 |
|
15 |
+
## Features
|
16 |
|
17 |
+
- Transforms text input into corresponding video sequences
|
18 |
+
- Fine-tuned using Hugging Face Transformers with datasets spanning various domains
|
19 |
+
- Capable of generating diverse video content based on textual descriptions
|
20 |
+
- Handles nuanced textual prompts to generate meaningful video representations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|