YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

MiniGPTv2 Project

Overview

MiniGPTv2 is a multimodal large language model that combines vision and language capabilities. This repository contains the implementation of MiniGPTv2 fine-tuned on facial emotion recognition and detailed image understanding tasks.

Model Architecture Base Architecture: MiniGPTv2 LLM Backbone: Llama-2-7b-chat Image Size: 448ร—448 Max Text Length: 3072 tokens LoRA Configuration: r=64, alpha=16 Gradient Checkpointing: Enabled for the vision encoder, disabled for LLM Training Configuration Training Checkpoint: Epoch 88 (56,320 steps) Steps per Epoch: 640 Batch Size: 1 (with gradient accumulation) Gradient Accumulation Steps: 16 Learning Rate: Initial: 3e-5 Minimum: 1e-6 Warmup: 1e-6 LR Schedule: Linear warmup with cosine decay Warmup Steps: 1000 Weight Decay: 0.05 Mixed Precision Training: Enabled Dataset Composition The model was trained on a mixture of datasets with the following sampling ratios:

ShareGPT Detail: 30% General visual conversation data GPT4Vision Face Detail: 10% Facial analysis and description data Realistic Emotions Detail: 20% Emotion recognition and interpretation data Usage Requirements

Text Only torch>=2.0.0 transformers>=4.28.0 timm fairscale accelerate Loading the Model

Python from minigptv2.model import MiniGPTv2

Initialize the model

model = MiniGPTv2.from_pretrained( llama_model_path="/path/to/Llama-2-7b-chat-hf", checkpoint_path="/path/to/minigptv2_checkpoint.pth", image_size=448, max_txt_len=3072 )

Set to evaluation mode

model.eval() Inference

Python from PIL import Image import torch

Load image

image = Image.open("example.jpg").convert("RGB")

Process input

response = model.generate( image=image, prompt="What emotions is this person expressing?", max_new_tokens=512 )

print(response) Training To continue training from the epoch 88 checkpoint:

Bash python train.py --config /path/to/config.yaml --resume_ckpt_path /path/to/epoch88_checkpoint.pth Evaluation

Bash python evaluate.py --config /path/to/eval_config.yaml --checkpoint /path/to/epoch88_checkpoint.pth License [Specify license information]

Citation

Text Only [Citation information for MiniGPTv2 and any relevant papers] Acknowledgements This project builds upon the MiniGPTv2 architecture and utilizes the Llama-2-7b-chat model. We thank the original authors for their contributions to the field.


license: apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using ValerianFourel/FaceVLM 1