Model Card

Veagle significantly improves the textual understanding & interpretation of images. The unique feature of Veagle is in its architectural change along with a combination of different components: a vision abstractor from mPlugOwl, Q-Former from InstructBLIP, and the Mistral language model. This combination allows Veagle to better understand and interpret the connection between text and images achieving state-of-the-art results. Veagle starts with a pre-trained vision encoder and language model and is trained in two stages. This method helps the model effectively use information from images and text together.

Further details about Veagle can be found in this detailed blog post: https://superagi.com/superagi-veagle/

arXiv paper link - https://arxiv.org/abs/2403.08773

Key Contributions

Veagle has surpassed most state-of-the-art (SOTA) models in major benchmarks, capable of outperforming competitors in various tasks and domains.
Using an optimized dataset, Veagle achieves high accuracy and efficiency. This demonstrates the model's effective learning from limited data. We meticulously curated a dataset of 3.5 million examples, specifically tailored to enhance visual representation learning.
Veagle's architecture is a unique blend of components, including a visionary abstractor inspired by mPlugOwl, the Q-Former module from InstructBLIP, and the powerful Mistral language model. This innovative architecture, complemented by an additional projectional layer and architectural refinements, empowers Veagle to excel in multimodal tasks.

Training

Trained by: SuperAGI Team
Hardware: NVIDIA 8 x A100 SxM (80GB)
LLM: Mistral 7B
Vision Encoder: mPLUG-OWL2
Duration of pretraining: 12 hours
Duration of finetuning: 25 hours
Number of epochs in pretraining: 3
Number of epochs in finetuning: 2
Batch size in pretraining: 8
Batch size in finetuning: 10
Learning Rate: 1e-5
Weight Decay: 0.05
Optmizer: AdamW

Steps to try

1.Clone the repository
git clone https://github.com/superagi/Veagle
cd Veagle

2. Run installation script
source venv/bin/activate
chmod +x install.sh
./install.sh

3. python evaluate.py --answer_qs \
 --model_name veagle_mistral \
--img_path images/food.jpeg \
 --question "Is the food given in the image is healthy or not?"

Evaluation

The SuperAGI team

Rajat Chawla, Arkajit Dutta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush vatsal, Sukrit Chatterjee, Mukunda NS, Ishaan Bhola