bruefire commited on
Commit
f66431c
1 Parent(s): 77e93e3

added a draft for workflow.md.

Browse files
Files changed (1) hide show
  1. workflow.md +69 -0
workflow.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Workflow for fine-tuning ModelScope in anime style
2
+ Here is a brief description of my process for fine-tuning ModelScope in an animated style.
3
+ Most of it may be basic, but I hope it will be useful.
4
+ There is no guarantee that what is written here is correct and will lead to good results!
5
+
6
+ ## Selection of training data
7
+ The goal of my training was to change the model to an overall anime style.
8
+ Only the art style was to override the ModelScope content, so I did not need a huge data set.
9
+ The total number of videos and images was only a few thousand.
10
+ Most of the video was taken from Tenor.
11
+ Many of the videos were posted as gifs and mp4s of one short scene.
12
+ It seems to be possible to automate the process using the API.
13
+ https://tenor.com/
14
+ I also used some smooth and stable motions and videos of 3d models with toon shading.
15
+ Short videos are sufficient, as we are not able to study very long data at this time.
16
+
17
+ ### Notes on data collection
18
+ Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
19
+ Frame rate also has an effect. If you want to train smooth motion, you need such data.
20
+ Scene switching also has an effect. If not addressed, the character may suddenly transform.
21
+ In the case of animation training, it is difficult to express details if only video sources are used, so images are also used for training.
22
+ Images can be created using StableDiffusion.
23
+ The fewer the differences between frames, the less likely the training results will be corrupted.
24
+ I avoided animations with too much dynamic motion.
25
+ It may be better to avoid scenes with multiple contexts and choose scenes with simple actions.
26
+ I collected data while checking if common emotions and actions were included.
27
+
28
+ ## Correcting data before training
29
+
30
+ ### Fixing resolution, burnout, and noise
31
+ It is safe to use a resolution at least equal to or higher than the training resolution.
32
+ The ratio should also match the training settings.
33
+ Trimming is possible with ffmpeg.
34
+ Incidentally, I have tried padding to ratio with a single color instead of trimming, but it seemed to decrease the training speed.
35
+
36
+ ### Converting small videos to larger sizes
37
+ I used this tool: https://github.com/k4yt3x/video2x
38
+ The recommended driver is Waifu2XCaffe. It is suitable for animation as it gets clear and sharp results. It also reduces noise a little.
39
+ If you cannot improve the image quality as well as the resolution, it may be better not to force a higher resolution.
40
+
41
+ ### Number of frames
42
+ Since many animations have a small number of frames, the results of the training are likely to be collapsed.
43
+ In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
44
+ The following tool may be useful for frame interpolation
45
+ https://github.com/google-research/frame-interpolation
46
+ If the variation between frames is too large, you will not get a clean result.
47
+
48
+ ## Tagging
49
+ For anime, WaifuTagger can extract content with good accuracy, so I created a slightly modified script for video and used it for animov512x.
50
+ https://github.com/bruefire/WaifuTaggerForVideo
51
+ Nevertheless, Blip2-Preprocessor can also extract enough general scene content. It may be a better idea to use them together.
52
+ https://github.com/ExponentialML/Video-BLIP2-Preprocessor
53
+
54
+ ## Configuration settings
55
+ todo
56
+
57
+ ## Evaluate training results
58
+ If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
59
+ I dared to train all videos with 'anime' tags.
60
+ Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned model with those that are similar to the original ModelScope) may help improve training.
61
+
62
+ It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
63
+ Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.
64
+
65
+ Perhaps it is because ModelScope itself is not trained at such a large resolution, but the training difficulty seems to be lower at lower resolutions.
66
+ In fact, when training Animov-0.1, I did not need to pay much attention to what is written here to get good results.
67
+ If you are fine-tuning ModelScope at larger resolutions, you may need to train incrementally with more data to avoid collapsing the results.
68
+
69
+ That's all.