Ananya8154 commited on
Commit
9de6bbc
1 Parent(s): ec9a707

Commit to readme

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - Llava
8
+ - Multimodal
9
+ - Image-Text-to-Text
10
+ - FineTuned
11
+ - Vision
12
+
13
+ ---
14
+
15
+ # Model Details
16
+ This model is a fine-tuned version of the LLaVA-v1.5-7B language
17
+ model, which has been adapted to work with a custom Historical Paintings
18
+ Dataset. The fine-tuning process utilized PEFT (Parameter-Efficient Fine-Tuning)
19
+ LoRA and DeepSpeed to reduce the number of trainable parameters and efficiently
20
+ utilize GPU resources.
21
+
22
+ ## How to use?
23
+ The folder 'llava-v1.5-7b-task-lora' contains the lora weights and the folder 'llava-ftmodel' contains the merged model weights and configurations.
24
+ - To use the model:
25
+ ```bash
26
+ git clone https://github.com/haotian-liu/LLaVA.git
27
+ cd LLaVA
28
+ ```
29
+ - Now, Place the folder 'llava-ftmodel' (this repo) in 'LLaVA' directory
30
+ - Make sure transformers version is 4.37.2!
31
+ - Now, place the 'test.jpg' from this repo, in the 'LLaVA' directory (To use it as a test image)
32
+ - Now run the following command:
33
+ ```bash
34
+ python -m llava.serve.cli --model-path 'llava-ftmodel' --image-file 'test.jpg'
35
+ ```
36
+ The model will ask for Human input, Type 'Describe this image' or 'What is depicted in this figure?' and hit enter!
37
+ ENJOY!
38
+
39
+
40
+ ## Intended Use
41
+ The fine-tuned LLaVA model is designed for tasks related to historical paintings, such as image captioning, visual question answering, and
42
+ multimodal understanding. It can be used by researchers, historians, and
43
+ enthusiasts interested in exploring and analyzing historical artworks.
44
+
45
+ ## Fine Tuning Procedure
46
+ The model was fine-tuned using 8 NVIDIA A40 GPUs, each with 48 GB of VRAM. The training process leveraged the efficiency of PEFT LoRA and
47
+ DeepSpeed to optimize the use of GPU resources and minimize the number of
48
+ trainable parameters. Once the new lora weights were trained, they were merged to the original model weights. After fine-tuning, the model achieved a final loss value
49
+ of 0.11.
50
+
51
+ ## Performance
52
+ The fine-tuned LLaVA model has demonstrated improved performance on tasks related to historical paintings compared to the original LLaVA-v1.5-7B
53
+ model. However, the exact performance metrics and benchmarks are not provided in
54
+ this model card.
55
+
56
+ ### Limitations and Biases
57
+ As with any language model, the fine-tuned LLaVA model may exhibit biases present in the training data, which could include historical,
58
+ cultural, or societal biases. Additionally, the model's performance may be
59
+ limited by the quality and diversity of the Historical Paintings Dataset used
60
+ for fine-tuning.
61
+
62
+ ### Ethical Considerations
63
+ Users of this model should be aware of potential ethical implications, such as the use of historical artworks without proper attribution
64
+ or consent. It is essential to respect intellectual property rights and ensure
65
+ that any generated content or analyses are used responsibly and respectfully.
66
+