tahamajs commited on
Commit
db6d8ad
·
verified ·
1 Parent(s): cc23895

Create Readme

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: peft
5
+ tags:
6
+ - paligemma
7
+ - visual-question-answering
8
+ - vqa
9
+ - clevr
10
+ - qlora
11
+ - multimodal
12
+ - peft
13
+ base_model: google/paligemma-3b-pt-224
14
+ datasets:
15
+ - leonardPKU/clevr_cogen_a_train
16
+ pipeline_tag: visual-question-answering
17
+ ---
18
+
19
+ # QLoRA Fine-tuned PaliGemma-3B for Visual Reasoning on CLEVR-CoGen
20
+
21
+ This repository contains the QLoRA adapters for the `google/paligemma-3b-pt-224` model, fine-tuned for a Visual Question Answering (VQA) task on the `leonardPKU/clevr_cogen_a_train` dataset.
22
+
23
+ This fine-tuned model demonstrates significantly improved performance on questions requiring spatial and logical reasoning about complex scenes with multiple objects compared to the base PaliGemma model. The use of QLoRA (4-bit quantization) makes it possible to run and train this powerful model on consumer-grade hardware.
24
+
25
+ ## Model Description
26
+
27
+ - **Base Model:** `google/paligemma-3b-pt-224`
28
+ - **Fine-tuning Technique:** QLoRA (Quantized Low-Rank Adaptation)
29
+ - **Task:** Visual Question Answering (VQA)
30
+ - **Dataset:** A subset of `leonardPKU/clevr_cogen_a_train`
31
+ - **Key Improvement:** Enhanced ability to perform complex reasoning, counting, and attribute identification in visual scenes.
32
+
33
+ ## How to Use
34
+
35
+ To use this model, you must load the 4-bit quantized base model and then apply the PEFT adapters from this repository.
36
+
37
+ ### Installation
38
+
39
+ First, ensure you have the necessary libraries installed:
40
+ ```bash
41
+ pip install -q transformers peft bitsandbytes accelerate Pillow requests