Thank you!

#3
by tintwotin - opened

I just want to say thank you for this model. I've made it the default model in my Blender text2video add-on: https://github.com/tin2tin/Generative_AI

It is surprisingly good at making paintings come alive:
https://twitter.com/tintwotin/status/1652943522734002177

Rolling waves and dresses blowing in the wind:
https://www.youtube.com/watch?v=BFhkXJVfLAU

This one was made with your 512x512 model(but I can only generate 13-14 frames on 6 GB vram):
https://twitter.com/tintwotin/status/1655068481769938944

Do you know if anyone are working on a model to replace the default modelscope model, but without the watermark?

@tintwotin
I notice you've used, adapted, and promoted my model, for which I'm truly grateful.
The videos linked are beautiful with Painting-like representation.
Honestly, I made this model for otaku hobbies, so your usage was a bit surprising, lol.

Do you know if anyone are working on a model to replace the default modelscope model, but without the watermark?

No, I've only seen it on the issue page of the fine-tuning repository.
It's surprising to see so few people attempting video fine-tuning.
While I've never considered myself an expert in additional training, I believe in its immense potential and think it deserves more recognition.

It's surprising to see so few people attempting video fine-tuning.

@strangeman3107 Would it be possible for you to add a wiki to your repo and document the steps you went through? I've tried a few times and only end up with garbage output.

As I understand it, is a lot of VRAM needed for finetuning. I only have 6 GB of VRAM. Would love to ex. try to do a film noir set using old films from archive.org.

@pjonesdotca
I'm not sure how severe the "garbage output" is, but there are several issues with the Text-To-Video-Finetuning repository.
Please check the issues section, including the closed ones, for any existing problems.
If I manage to create a how-to guide myself, I will reply here (no promises).

@pjonesdotca
Sorry it is messy, but I have described the training workflow.
If you are still interested, please take a look.
https://huggingface.co/strangeman3107/animov-512x/blob/main/workflow.md

Regarding the tagging stage, did you use specific keyword(s) for anime? I tried finetuning on images (about 10) of a known actress and it seemed to have no effect so I'm curious as to the size ( you mentioned thousands of videos ) of the dataset as well as the specific keywords the model was trained on. I've created a couple of LORAs and went in expecting the process to be similar

@pjonesdotca
'anime' tag was put on all of them. Other emotions and a few actions were intentionally tagged. Most of the others were left to the taggers.
I guess I misled you about 'few thousands of videos and images'. I prepared that much data because I wanted to achieve an anime style over a wide area.
If you only want to learn a specific person, I think even only high double-digits of images should be enough to have a good result.
As a test, I trained a Japanese person with 10 images and tag of his name. (search for "zun game developer")
I trained him in 1000 steps with the default learning rate, and it did not work with just his name. After increasing the weight of his name in the prompt and adding about 4 tags that Tagger added to his image, I was able to get a guy who looked like him to eat spaghetti.
https://imgur.com/a/VE6zyCi
If you want more flexible results, 10 images may not be enough. Even if you increase the number of steps, if you have too little data, the noise may gradually increase and ruin the learning.

Did you use brackets to increase the weight of his name?
Ex: (((zun game developer)))

Or some other technique?
Apologies for all the questions.

@pjonesdotca
On Diffusers, a weighting can be changed with "+" or "-", and can be adjusted with those quantities.
Ex: "apple++", "pen----"

Sign up or log in to comment