uncritical but with anime styll with esy temporary fix

#5
by TheBigBlockPC - opened

The model has a small issue where the model performs wurse with more steps compared to less steps:
50 steps:

25 steps:

prompt:
a huge fox with fluffy dark orange fur and nine tails walking in a forest. anime style

this issue only triggers in anime style and setting the steps to 25 fixes the issue.
i tested in on the int8 quantisation.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

First of all, your prompt is relatively short, which directly poses a risk. In our README, it is mentioned that you should use longer prompts.

Additionally, the situation where 25 steps perform better than 50 steps is quite rare. In the 5B model, we use DPM instead of DDIM, which should theoretically allow generating a video of the same quality with fewer steps. Therefore, in theory, 30-40 steps should also be able to generate a video. However, it is not common for 25 steps to produce better quality than 50 steps. This is regardless of whether you used INT8 quantization or not.

for the prompt length issue. should i just use a LLM like GPT-4 to enhance the prompt

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

sure and the full code including converting is in our github here
https://github.com/THUDM/CogVideo/blob/main/inference/convert_demo.py
with fewshot prompt

the code in the repo for the prompt enhancing is a bit inefficient for token usage. using a fine tune would be cheaper

Sign up or log in to comment