Multimodal Training with Axolotl

#1
by ritabratamaiti - opened

According to the config yml, it seems that Axolotl supports multimodal fine-tuning:

# multimodal pretrain
multimodal: true
mm_vision_tower: openai/clip-vit-large-patch14
tune_mm_mlp_adapter: true
mm_freeze_backbone: true
mm_vision_select_layer: -2
mm_projector_type: mlp2x_gelu
mm_image_folder: ./llava/
mm_use_im_patch_token: false

According to the Axolotl GitHub however, this feature is WIP. Is it possible to update Axolotl to use multimodal fine-tuning?

Sign up or log in to comment