LLaVA_X_KoLlama2-7B-0.1v

KoT-platypus2 X LLaVA

This model is a large multimodal model (LMM) that combines the LLM(KoT-platypus2-7B) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset using QLoRA.

Model Details

Model Developers: Nagase_Kotono
Base Model: kyujinpy/KoT-platypus2-7B
Model Architecture: LLaVA_X_KoLlama2-7B is an open-source chatbot trained by fine-tuning Llama2 on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the Llama2 transformer architecture.
Training Dataset: KoLLaVA-CC3M-Pretrain-595K, KoLLaVA-Instruct-150k

Hardware & Software

Pretrain

GPU: NVIDIA A100 X2
Used DeepSpeed, Transformers

Finetune

GPU: NVIDIA RTX 4000 Ada Generation X 8
Used DeepSpeed, Transformers

ValueError

LLaVA_X_KoLlama2-7B is a base model of kyujinpy/KoT-platypus2-7B. The model is based on beomi/llama-2-ko-7b.
Since Llama-2-Ko uses FastTokenizer provided by HF tokenizers NOT sentencepiece package, it is required to use use_fast=True option when initialize tokenizer.