strangeman3107
/

animov-512x

Text-to-Video

Diffusers

TextToVideoSDPipeline

anime

Model card Files Files and versions Community

bruefire commited on Jun 7, 2023

Commit

99b4771

1 Parent(s): f66431c

fixed workflow.md a bit.

Browse files

Files changed (2) hide show

config.yaml +73 -0
workflow.md +12 -13

config.yaml ADDED Viewed

	@@ -0,0 +1,73 @@

+pretrained_model_path: ./outputs/train_2023-05-02T00-50-05/checkpoint-15000/
+output_dir: ./outputs/
+train_data:
+  width: 512
+  height: 512
+  use_bucketing: true
+  sample_start_idx: 1
+  fps: 24
+  frame_step: 5
+  n_sample_frames: 45
+  single_video_path: ''
+  single_video_prompt: ''
+  fallback_prompt: ''
+  path: E:/userdata/Pictures/ai_trainning/t2v-v2/gif/vid/old/
+  json_path: ./json/anime-v2.json
+  image_dir: E:/userdata/Pictures/ai_trainning/t2v-v2/img/
+  single_img_prompt: ''
+validation_data:
+  prompt: ''
+  sample_preview: true
+  num_frames: 16
+  width: 512
+  height: 512
+  num_inference_steps: 25
+  guidance_scale: 9
+dataset_types:
+- json
+- image
+validation_steps: 100
+extra_unet_params: null
+extra_text_encoder_params: null
+train_batch_size: 1
+max_train_steps: 10000
+learning_rate: 5.0e-06
+scale_lr: false
+lr_scheduler: constant
+lr_warmup_steps: 0
+adam_beta1: 0.9
+adam_beta2: 0.999
+adam_weight_decay: 0.01
+adam_epsilon: 1.0e-08
+max_grad_norm: 1.0
+gradient_accumulation_steps: 1
+checkpointing_steps: 2500
+resume_from_checkpoint: null
+mixed_precision: fp16
+use_8bit_adam: false
+enable_xformers_memory_efficient_attention: false
+enable_torch_2_attn: true
+seed: 64
+extend_dataset: false
+cached_latent_dir: null
+use_unet_lora: true
+unet_lora_modules:
+- ResnetBlock2D
+text_encoder_lora_modules:
+- CLIPEncoderLayer
+lora_rank: 25
+lora_path: ''
+kwargs: {}
+cache_latents: true
+gradient_checkpointing: true
+offset_noise_strength: 0.1
+text_encoder_gradient_checkpointing: false
+train_text_encoder: false
+trainable_modules:
+- attn1
+- attn2
+- temp_conv
+trainable_text_modules:
+- all
+use_offset_noise: false
+use_text_lora: true

workflow.md CHANGED Viewed

@@ -1,5 +1,5 @@
 # Workflow for fine-tuning ModelScope in anime style
-Here is a brief description of my process for fine-tuning ModelScope in an animated style.
 Most of it may be basic, but I hope it will be useful.
 There is no guarantee that what is written here is correct and will lead to good results!
@@ -7,12 +7,12 @@ There is no guarantee that what is written here is correct and will lead to good
 The goal of my training was to change the model to an overall anime style.
 Only the art style was to override the ModelScope content, so I did not need a huge data set.
 The total number of videos and images was only a few thousand.
-Most of the video was taken from Tenor.
 Many of the videos were posted as gifs and mp4s of one short scene.
 It seems to be possible to automate the process using the API.
-https://tenor.com/
 I also used some smooth and stable motions and videos of 3d models with toon shading.
-Short videos are sufficient, as we are not able to study very long data at this time.
 ### Notes on data collection
 Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
@@ -27,7 +27,7 @@ I collected data while checking if common emotions and actions were included.
 ## Correcting data before training
-### Fixing resolution, burnout, and noise
 It is safe to use a resolution at least equal to or higher than the training resolution.
 The ratio should also match the training settings.
 Trimming is possible with ffmpeg.
@@ -42,22 +42,21 @@ If you cannot improve the image quality as well as the resolution, it may be bet
 Since many animations have a small number of frames, the results of the training are likely to be collapsed.
 In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
 The following tool may be useful for frame interpolation
-https://github.com/google-research/frame-interpolation
 If the variation between frames is too large, you will not get a clean result.
 ## Tagging
-For anime, WaifuTagger can extract content with good accuracy, so I created a slightly modified script for video and used it for animov512x.
-https://github.com/bruefire/WaifuTaggerForVideo
-Nevertheless, Blip2-Preprocessor can also extract enough general scene content. It may be a better idea to use them together.
-https://github.com/ExponentialML/Video-BLIP2-Preprocessor
-## Configuration settings
-todo
 ## Evaluate training results
 If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
 I dared to train all videos with 'anime' tags.
-Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned model with those that are similar to the original ModelScope) may help improve training.
 It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
 Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.

 # Workflow for fine-tuning ModelScope in anime style
+Here is a brief description of my process for fine-tuning ModelScope in an anime style with [Text-To-Video-Finetuning](https://github.com/ExponentialML/Text-To-Video-Finetuning).
 Most of it may be basic, but I hope it will be useful.
 There is no guarantee that what is written here is correct and will lead to good results!
 The goal of my training was to change the model to an overall anime style.
 Only the art style was to override the ModelScope content, so I did not need a huge data set.
 The total number of videos and images was only a few thousand.
+Most of the video was taken from [Tenor](https://tenor.com/).
 Many of the videos were posted as gifs and mp4s of one short scene.
 It seems to be possible to automate the process using the API.
 I also used some smooth and stable motions and videos of 3d models with toon shading.
+Short videos with a few seconds are sufficient, as we are not able to train long data yet.
 ### Notes on data collection
 Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
 ## Correcting data before training
+### Fixing resolution, blurring, and noise
 It is safe to use a resolution at least equal to or higher than the training resolution.
 The ratio should also match the training settings.
 Trimming is possible with ffmpeg.
 Since many animations have a small number of frames, the results of the training are likely to be collapsed.
 In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
 The following tool may be useful for frame interpolation
+https://github.com/google-research/frame-interpolation.
 If the variation between frames is too large, you will not get a clean result.
 ## Tagging
+For anime, WaifuTagger can extract content with good accuracy, so I created [a slightly modified script](https://github.com/bruefire/WaifuTaggerForVideo) for video and used it for animov512x.
+Nevertheless, [BLIP2-Preprocessor](https://github.com/ExponentialML/Video-BLIP2-Preprocessor) can also extract enough general scene content. It may be a better idea to use them together.
+## config.yaml settings
+I'm still not quite sure what is appropriate for this.
+[config.yaml for animov512x](https://huggingface.co/strangeman3107/animov-512x/blob/main/config.yaml)
 ## Evaluate training results
 If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
 I dared to train all videos with 'anime' tags.
+Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned result with those that are near to the original ModelScope) may help improve training.
 It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
 Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.