tahamajs
/

paligemma-clevr-cogen-qlora

Visual Question Answering

Model card Files Files and versions

tahamajs commited on Sep 25

Commit

db6d8ad

·

verified ·

1 Parent(s): cc23895

Create Readme

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+---
+language: en
+license: apache-2.0
+library_name: peft
+tags:
+- paligemma
+- visual-question-answering
+- vqa
+- clevr
+- qlora
+- multimodal
+- peft
+base_model: google/paligemma-3b-pt-224
+datasets:
+- leonardPKU/clevr_cogen_a_train
+pipeline_tag: visual-question-answering
+---
+# QLoRA Fine-tuned PaliGemma-3B for Visual Reasoning on CLEVR-CoGen
+This repository contains the QLoRA adapters for the `google/paligemma-3b-pt-224` model, fine-tuned for a Visual Question Answering (VQA) task on the `leonardPKU/clevr_cogen_a_train` dataset.
+This fine-tuned model demonstrates significantly improved performance on questions requiring spatial and logical reasoning about complex scenes with multiple objects compared to the base PaliGemma model. The use of QLoRA (4-bit quantization) makes it possible to run and train this powerful model on consumer-grade hardware.
+## Model Description
+- **Base Model:** `google/paligemma-3b-pt-224`
+- **Fine-tuning Technique:** QLoRA (Quantized Low-Rank Adaptation)
+- **Task:** Visual Question Answering (VQA)
+- **Dataset:** A subset of `leonardPKU/clevr_cogen_a_train`
+- **Key Improvement:** Enhanced ability to perform complex reasoning, counting, and attribute identification in visual scenes.
+## How to Use
+To use this model, you must load the 4-bit quantized base model and then apply the PEFT adapters from this repository.
+### Installation
+First, ensure you have the necessary libraries installed:
+```bash
+pip install -q transformers peft bitsandbytes accelerate Pillow requests