llava-v1.5-llama-3-8b-pretrain Model Card

This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models. Please follow my reproduced implementation LLaVA-Llama-3 for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.

Training dataset

  • 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.

Architecture

  • LLM: llama-3-8b (Frozen)
  • Vision-Language Adapter: MLP
  • Vision Encoder: CLIP-ViT-L-336px (Frozen)
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Dataset used to train weizhiwang/llava-v1.5-llama-3-8b-pretrain-clip-large-336px