gsterkin commited on
Commit
77fcf79
1 Parent(s): 4bab411

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -17
README.md CHANGED
@@ -12,7 +12,7 @@ language:
12
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
13
 
14
  LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
15
-
16
 
17
  <img src="./media/trailer.gif" alt="trailer" width="512">
18
 
@@ -33,14 +33,32 @@ LTX-Video is the first DiT-based video generation model capable of generating hi
33
 
34
  ## Usage
35
 
36
- ### Setup
37
- The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  #### Installation
40
 
 
 
41
  ```bash
42
  git clone https://github.com/LightricksResearch/LTX-Video.git
43
- cd ltx_video-core
44
 
45
  # create env
46
  python -m venv env
@@ -57,28 +75,24 @@ model_path = 'PATH' # The local directory to save downloaded checkpoint
57
  snapshot_download("Lightricks/LTX-Video", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
58
  ```
59
 
60
- ### Inference
61
-
62
- #### Inference Code
63
 
64
- To use our model, please follow the inference code in [inference.py](./inference.py):
65
-
66
- #### General tips:
67
- * The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames.
68
- * The model works best on resolutions under 720 x 1280 and number of frames below 257.
69
- * Prompts should be in English. The more elaborate the better. Good prompt looks like `The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.`
70
 
71
- #### For text-to-video generation:
72
 
73
  ```bash
74
  python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
75
  ```
76
 
77
- #### For image-to-video generation:
78
 
79
  ```bash
80
  python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
81
  ```
82
 
83
- ### ComfyUI Integration
84
- To use our model with ComfyUI, please follow the instructions at [https://github.com/Lightricks/ComfyUI-LTXVideo/]().
 
 
 
 
12
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
13
 
14
  LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
15
+ We provide a model for both text-to-video as well as image+text-to-video usecases
16
 
17
  <img src="./media/trailer.gif" alt="trailer" width="512">
18
 
 
33
 
34
  ## Usage
35
 
36
+ ### Direct use
37
+ You can use the model for purposes under the [license](https://github.com/Lightricks/LTX-Video/blob/main/LICENSE)
38
+
39
+ ### General tips:
40
+ * The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames.
41
+ * The model works best on resolutions under 720 x 1280 and number of frames below 257.
42
+ * Prompts should be in English. The more elaborate the better. Good prompt looks like `The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.`
43
+
44
+ ### Online demo
45
+ The model is accessible right away via following links:
46
+ - [HF Playground](https://huggingface.co/spaces/Lightricks/LTX-Video-Playground)
47
+ - [Fal.ai text-to-video](https://fal.ai/models/fal-ai/ltx-video)
48
+ - [Fal.ai image-to-video](https://fal.ai/models/fal-ai/ltx-video/image-to-video)
49
+
50
+ ### ComfyUI
51
+ To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
52
+
53
+ ### Run locally
54
 
55
  #### Installation
56
 
57
+ The codebase was tested with Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2.
58
+
59
  ```bash
60
  git clone https://github.com/LightricksResearch/LTX-Video.git
61
+ cd LTX-Video
62
 
63
  # create env
64
  python -m venv env
 
75
  snapshot_download("Lightricks/LTX-Video", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
76
  ```
77
 
78
+ #### Inference
 
 
79
 
80
+ To use our model, please follow the inference code in [inference.py](https://github.com/Lightricks/LTX-Video/blob/main/inference.py):
 
 
 
 
 
81
 
82
+ ##### For text-to-video generation:
83
 
84
  ```bash
85
  python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
86
  ```
87
 
88
+ ##### For image-to-video generation:
89
 
90
  ```bash
91
  python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
92
  ```
93
 
94
+ ## Limitations
95
+ - This model is not intended or able to provide factual information.
96
+ - As a statistical model this checkpoint might amplify existing societal biases.
97
+ - The model may fail to generate videos that matches the prompts perfectly.
98
+ - Prompt following is heavily influenced by the prompting-style.