Discrepancy between HF output and provided examples

by modeLIMINAL - opened

I haven't attempted to run this locally just yet so maybe this is just a HF issue. Most of the videos I've generated thus far with the demo wouldn't lead to a coherent model. I also tried using the first frames of your provided examples on your Github page, and none of them result in anything close to the fidelity your examples achieve. Can you provide any insight into this?

