Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Description

Trained on top of llava-13b-v1: https://huggingface.co/wza/llava-13b-v1 (github: https://github.com/haotian-liu/LLaVA)

Dataset

Constructed dataset on stock k lines, both pre-train and instrcution-tune

Training scripts

pre-train:

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
    LLaVA/llava/train/train_mem.py \
    --model_name_or_path llava-13b-v1 \
    --data_path JsonFormatDataset/PretrainData/data.json \
    --image_folder JsonFormatDataset/PretrainData/images \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end \
    --bf16 True \
    --output_dir ./checkpoints/llava-13b-pretrain \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2400 \
    --save_total_limit 1 \
    --learning_rate 2e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

instruction:

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
    LLaVA/llava/train/train_mem.py \
    --model_name_or_path ./checkpoints/llava-13b-pretrain \
    --data_path JsonFormatDataset/InstructionTuneData/data.json \
    --image_folder JsonFormatDataset/InstructionTuneData/images/ \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end \
    --bf16 True \
    --output_dir ./checkpoints/llava-13b-instruction \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 5000 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to wandb

Training settings

8xA100-80G-sxm4

Pre-train: https://wandb.ai/wzaa/huggingface/runs/cd5ou876/overview?workspace=user-wangziao1993

Fine-tune: https://wandb.ai/wzaa/huggingface/runs/y5bsz8dw/overview?workspace=user-wangziao1993

Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.