caT text to video

Conditionally augmented text-to-video model. Uses pre-trained weights from modelscope text-to-video model, augmented with temporal conditioning transformers to extend generated clips and create a smooth transition between them. Supports prompt interpolation as well to change scenes during clip extensions.

This model was trained at home as a hobby.

Do not expect high quality samples.

Installation

Clone the Repository

git clone https://github.com/motexture/caT-text-to-video.git
cd caT-text-to-video
python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
python3 run.py

Visit the provided URL in your browser to interact with the interface and start generating videos.

Note: Ensure that you are on the latest commit, as the positional encodings have been updated compared to the initial models.

Downloads last month
45
Inference API
Inference API (serverless) does not yet support diffusers models for this pipeline type.

Model tree for motexture/caT-text-to-video

Finetuned
(1)
this model

Dataset used to train motexture/caT-text-to-video