Ananya8154
/

Llava-v1.5-7b-FineTuned-Historic-Art-dataset

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Ananya8154 commited on Jul 20

Commit

9de6bbc

•

1 Parent(s): ec9a707

Commit to readme

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+library_name: transformers
+tags:
+- Llava
+- Multimodal
+- Image-Text-to-Text
+- FineTuned
+- Vision
+---
+# Model Details
+This model is a fine-tuned version of the LLaVA-v1.5-7B language
+model, which has been adapted to work with a custom Historical Paintings
+Dataset. The fine-tuning process utilized PEFT (Parameter-Efficient Fine-Tuning)
+LoRA and DeepSpeed to reduce the number of trainable parameters and efficiently
+utilize GPU resources.
+## How to use?
+The folder 'llava-v1.5-7b-task-lora' contains the lora weights and the folder 'llava-ftmodel' contains the merged model weights and configurations.
+- To use the model:
+  ```bash
+  git clone https://github.com/haotian-liu/LLaVA.git
+  cd LLaVA
+  ```
+- Now, Place the folder 'llava-ftmodel' (this repo) in 'LLaVA' directory
+- Make sure transformers version is 4.37.2!
+- Now, place the 'test.jpg' from this repo, in the 'LLaVA' directory (To use it as a test image)
+- Now run the following command:
+  ```bash
+  python -m llava.serve.cli --model-path 'llava-ftmodel' --image-file 'test.jpg'
+  ```
+The model will ask for Human input, Type 'Describe this image' or 'What is depicted in this figure?' and hit enter!
+ENJOY!
+## Intended Use
+The fine-tuned LLaVA model is designed for tasks related to historical paintings, such as image captioning, visual question answering, and
+multimodal understanding. It can be used by researchers, historians, and
+enthusiasts interested in exploring and analyzing historical artworks.
+## Fine Tuning Procedure
+The model was fine-tuned using 8 NVIDIA A40 GPUs, each with 48 GB of VRAM. The training process leveraged the efficiency of PEFT LoRA and
+DeepSpeed to optimize the use of GPU resources and minimize the number of
+trainable parameters. Once the new lora weights were trained, they were merged to the original model weights. After fine-tuning, the model achieved a final loss value
+of 0.11.
+## Performance
+The fine-tuned LLaVA model has demonstrated improved performance on tasks related to historical paintings compared to the original LLaVA-v1.5-7B
+model. However, the exact performance metrics and benchmarks are not provided in
+this model card.
+### Limitations and Biases
+As with any language model, the fine-tuned LLaVA model may exhibit biases present in the training data, which could include historical,
+cultural, or societal biases. Additionally, the model's performance may be
+limited by the quality and diversity of the Historical Paintings Dataset used
+for fine-tuning.
+### Ethical Considerations
+Users of this model should be aware of potential ethical implications, such as the use of historical artworks without proper attribution
+or consent. It is essential to respect intellectual property rights and ensure
+that any generated content or analyses are used responsibly and respectfully.