PLLaVA Model Card

Model details

Model type: PLLaVA-13B is an open-source video-language chatbot trained by fine-tuning Image-LLM on video instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: llava-hf/llava-v1.6-vicuna-13b-hf

Model date: PLLaVA-13B was trained in April 2024.

Paper or resources for more information:

License

llava-hf/llava-v1.6-vicuna-13b-hf license.

Where to send questions or comments about the model: https://github.com/magic-research/PLLaVA/issues

Intended use

Primary intended uses: The primary use of PLLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

Video-Instruct-Tuning data of OpenGVLab/VideoChat2-IT

Evaluation dataset

A collection of 6 benchmarks, including 5 VQA benchmarks and 1 recent benchmarks specifically proposed for Video-LMMs.

Downloads last month
72
Safetensors
Model size
13.5B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ermu2001/pllava-13b

Spaces using ermu2001/pllava-13b 4