Cseti/CogVideoX-LoRA-Wallace_and_Gromit

Oct 17

Could you share any information on the training regime? Dataset, hyper parameters, etc.

Owner Oct 17

Hi, Sure, I used 70 49-frame videos. I captioned them using qwen2-vlm but it made many mistakes so I had to review and correct them one by one. Regarding the parameters, as this was my first LoRA with CogVideoX, I basically used the settings that come with the CogVideoX-factory repo. The whole training took around 13 hours on a L40S and used around 32 GB VRAM but there are suggested optimizations in the cogvideox-factory repo that make it possible for the training to work on 24 GB of VRAM.

GeeveGeorge

Oct 18

•

edited Oct 18

@Cseti could your share you data-prep (starting from a folder with videos) scripts to split and caption , fine-tuning scripts , it would be amazing to try to make some LoRas using your scripts. it would be great if you could make a github repo (pushing your current scripts)

Cseti

Owner Oct 18

@Cseti could your share you data-prep (starting from a folder with videos) scripts to split and caption , fine-tuning scripts , it would be amazing to try to make some LoRas using your scripts. it would be great if you could make a github repo (pushing your current scripts)

I followed the instructions in cogvideo-factory step-by-step. They also discussing the required folder structure, here. For running gwen-vl model I used ComfyUI nodes, but it made many mistakes, however Cogvideo guys released their own caption method here. I couldn't test it yet but if they really used that for captioning the model, it could be the best method to make captions for LoRA training too.

Cseti
/

CogVideoX-LoRA-Wallace_and_Gromit

This looks amazing