Finetunning the model on custom dataset of text-video pairs.

#59
by Snarky36 - opened

Hello everyone. I am trying to find out if it is possible to finetune a T2V model using a custome dataset that would have multiple pairs of text as input and a video as output. Is it possible to do something like this and if it is is there any example of finetunning text-to-video-ms-1.7b?
I saw that there is a Tune-A-Video repo that is finetunning a model onli using a single video and i was wondering if i could make it work with multiple promps and videos.
My dataset has aprox 1300 of text-video pair with sign language and i would like to make the model to translate from a natural language into a video with a man that is speaking in sign language.

Sign up or log in to comment