bruefire commited on
Commit
99b4771
1 Parent(s): f66431c

fixed workflow.md a bit.

Browse files
Files changed (2) hide show
  1. config.yaml +73 -0
  2. workflow.md +12 -13
config.yaml ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pretrained_model_path: ./outputs/train_2023-05-02T00-50-05/checkpoint-15000/
2
+ output_dir: ./outputs/
3
+ train_data:
4
+ width: 512
5
+ height: 512
6
+ use_bucketing: true
7
+ sample_start_idx: 1
8
+ fps: 24
9
+ frame_step: 5
10
+ n_sample_frames: 45
11
+ single_video_path: ''
12
+ single_video_prompt: ''
13
+ fallback_prompt: ''
14
+ path: E:/userdata/Pictures/ai_trainning/t2v-v2/gif/vid/old/
15
+ json_path: ./json/anime-v2.json
16
+ image_dir: E:/userdata/Pictures/ai_trainning/t2v-v2/img/
17
+ single_img_prompt: ''
18
+ validation_data:
19
+ prompt: ''
20
+ sample_preview: true
21
+ num_frames: 16
22
+ width: 512
23
+ height: 512
24
+ num_inference_steps: 25
25
+ guidance_scale: 9
26
+ dataset_types:
27
+ - json
28
+ - image
29
+ validation_steps: 100
30
+ extra_unet_params: null
31
+ extra_text_encoder_params: null
32
+ train_batch_size: 1
33
+ max_train_steps: 10000
34
+ learning_rate: 5.0e-06
35
+ scale_lr: false
36
+ lr_scheduler: constant
37
+ lr_warmup_steps: 0
38
+ adam_beta1: 0.9
39
+ adam_beta2: 0.999
40
+ adam_weight_decay: 0.01
41
+ adam_epsilon: 1.0e-08
42
+ max_grad_norm: 1.0
43
+ gradient_accumulation_steps: 1
44
+ checkpointing_steps: 2500
45
+ resume_from_checkpoint: null
46
+ mixed_precision: fp16
47
+ use_8bit_adam: false
48
+ enable_xformers_memory_efficient_attention: false
49
+ enable_torch_2_attn: true
50
+ seed: 64
51
+ extend_dataset: false
52
+ cached_latent_dir: null
53
+ use_unet_lora: true
54
+ unet_lora_modules:
55
+ - ResnetBlock2D
56
+ text_encoder_lora_modules:
57
+ - CLIPEncoderLayer
58
+ lora_rank: 25
59
+ lora_path: ''
60
+ kwargs: {}
61
+ cache_latents: true
62
+ gradient_checkpointing: true
63
+ offset_noise_strength: 0.1
64
+ text_encoder_gradient_checkpointing: false
65
+ train_text_encoder: false
66
+ trainable_modules:
67
+ - attn1
68
+ - attn2
69
+ - temp_conv
70
+ trainable_text_modules:
71
+ - all
72
+ use_offset_noise: false
73
+ use_text_lora: true
workflow.md CHANGED
@@ -1,5 +1,5 @@
1
  # Workflow for fine-tuning ModelScope in anime style
2
- Here is a brief description of my process for fine-tuning ModelScope in an animated style.
3
  Most of it may be basic, but I hope it will be useful.
4
  There is no guarantee that what is written here is correct and will lead to good results!
5
 
@@ -7,12 +7,12 @@ There is no guarantee that what is written here is correct and will lead to good
7
  The goal of my training was to change the model to an overall anime style.
8
  Only the art style was to override the ModelScope content, so I did not need a huge data set.
9
  The total number of videos and images was only a few thousand.
10
- Most of the video was taken from Tenor.
11
  Many of the videos were posted as gifs and mp4s of one short scene.
12
  It seems to be possible to automate the process using the API.
13
- https://tenor.com/
14
  I also used some smooth and stable motions and videos of 3d models with toon shading.
15
- Short videos are sufficient, as we are not able to study very long data at this time.
16
 
17
  ### Notes on data collection
18
  Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
@@ -27,7 +27,7 @@ I collected data while checking if common emotions and actions were included.
27
 
28
  ## Correcting data before training
29
 
30
- ### Fixing resolution, burnout, and noise
31
  It is safe to use a resolution at least equal to or higher than the training resolution.
32
  The ratio should also match the training settings.
33
  Trimming is possible with ffmpeg.
@@ -42,22 +42,21 @@ If you cannot improve the image quality as well as the resolution, it may be bet
42
  Since many animations have a small number of frames, the results of the training are likely to be collapsed.
43
  In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
44
  The following tool may be useful for frame interpolation
45
- https://github.com/google-research/frame-interpolation
46
  If the variation between frames is too large, you will not get a clean result.
47
 
48
  ## Tagging
49
- For anime, WaifuTagger can extract content with good accuracy, so I created a slightly modified script for video and used it for animov512x.
50
- https://github.com/bruefire/WaifuTaggerForVideo
51
- Nevertheless, Blip2-Preprocessor can also extract enough general scene content. It may be a better idea to use them together.
52
- https://github.com/ExponentialML/Video-BLIP2-Preprocessor
53
 
54
- ## Configuration settings
55
- todo
 
56
 
57
  ## Evaluate training results
58
  If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
59
  I dared to train all videos with 'anime' tags.
60
- Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned model with those that are similar to the original ModelScope) may help improve training.
61
 
62
  It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
63
  Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.
 
1
  # Workflow for fine-tuning ModelScope in anime style
2
+ Here is a brief description of my process for fine-tuning ModelScope in an anime style with [Text-To-Video-Finetuning](https://github.com/ExponentialML/Text-To-Video-Finetuning).
3
  Most of it may be basic, but I hope it will be useful.
4
  There is no guarantee that what is written here is correct and will lead to good results!
5
 
 
7
  The goal of my training was to change the model to an overall anime style.
8
  Only the art style was to override the ModelScope content, so I did not need a huge data set.
9
  The total number of videos and images was only a few thousand.
10
+ Most of the video was taken from [Tenor](https://tenor.com/).
11
  Many of the videos were posted as gifs and mp4s of one short scene.
12
  It seems to be possible to automate the process using the API.
13
+
14
  I also used some smooth and stable motions and videos of 3d models with toon shading.
15
+ Short videos with a few seconds are sufficient, as we are not able to train long data yet.
16
 
17
  ### Notes on data collection
18
  Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
 
27
 
28
  ## Correcting data before training
29
 
30
+ ### Fixing resolution, blurring, and noise
31
  It is safe to use a resolution at least equal to or higher than the training resolution.
32
  The ratio should also match the training settings.
33
  Trimming is possible with ffmpeg.
 
42
  Since many animations have a small number of frames, the results of the training are likely to be collapsed.
43
  In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
44
  The following tool may be useful for frame interpolation
45
+ https://github.com/google-research/frame-interpolation.
46
  If the variation between frames is too large, you will not get a clean result.
47
 
48
  ## Tagging
49
+ For anime, WaifuTagger can extract content with good accuracy, so I created [a slightly modified script](https://github.com/bruefire/WaifuTaggerForVideo) for video and used it for animov512x.
50
+ Nevertheless, [BLIP2-Preprocessor](https://github.com/ExponentialML/Video-BLIP2-Preprocessor) can also extract enough general scene content. It may be a better idea to use them together.
 
 
51
 
52
+ ## config.yaml settings
53
+ I'm still not quite sure what is appropriate for this.
54
+ [config.yaml for animov512x](https://huggingface.co/strangeman3107/animov-512x/blob/main/config.yaml)
55
 
56
  ## Evaluate training results
57
  If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
58
  I dared to train all videos with 'anime' tags.
59
+ Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned result with those that are near to the original ModelScope) may help improve training.
60
 
61
  It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
62
  Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.