Mouwiya commited on
Commit
09e8a85
1 Parent(s): 4e91937

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -4,3 +4,61 @@ pipeline_tag: image-to-text
4
  datasets:
5
  - Mouwiya/image-in-Words400
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  datasets:
5
  - Mouwiya/image-in-Words400
6
  ---
7
+ # BLIP Image Captioning
8
+
9
+ ## Model Description
10
+ BLIP_image_captioning is a model based on the BLIP (Bootstrapping Language-Image Pre-training) architecture, specifically designed for image captioning tasks. The model has been fine-tuned on the "image-in-words400" dataset, which consists of images and their corresponding descriptive captions. This model leverages both visual and textual data to generate accurate and contextually relevant captions for images.
11
+
12
+ ## Model Details
13
+ - **Model Architecture**: BLIP (Bootstrapping Language-Image Pre-training)
14
+ - **Base Model**: Salesforce/blip-image-captioning-base
15
+ - **Fine-tuning Dataset**: mouwiya/image-in-words400
16
+ - **Number of Parameters**: 109 million
17
+
18
+ ## Training Data
19
+ The model was fine-tuned on a shuffled and subsetted version of the **"image-in-words400"** dataset. A total of 400 examples were used during the fine-tuning process to allow for faster iteration and development.
20
+
21
+ ## Training Procedure
22
+ - **Optimizer**: AdamW
23
+ - **Learning Rate**: 2e-5
24
+ - **Batch Size**: 16
25
+ - **Epochs**: 3
26
+ - **Evaluation Metric**: BLEU Score
27
+
28
+ ## Usage
29
+ To use this model for image captioning, you can load it using the Hugging Face transformers library and perform inference as shown below:
30
+ ```python
31
+ from transformers import BlipProcessor, BlipForConditionalGeneration
32
+ from PIL import Image
33
+ import requests
34
+ from io import BytesIO
35
+
36
+ # Load the processor and model
37
+ model_name = "Mouwiya/BLIP_image_captioning"
38
+ processor = BlipProcessor.from_pretrained(model_name)
39
+ model = BlipForConditionalGeneration.from_pretrained(model_name)
40
+
41
+ # Example usage
42
+ image_url = "URL_OF_THE_IMAGE"
43
+ response = requests.get(image_url)
44
+ image = Image.open(BytesIO(response.content)).convert("RGB")
45
+
46
+ inputs = processor(images=image, return_tensors="pt")
47
+ outputs = model.generate(**inputs)
48
+ caption = processor.decode(outputs[0], skip_special_tokens=True)
49
+ print(caption)
50
+
51
+ ```
52
+ ## Evaluation
53
+ The model was evaluated on a subset of the "image-in-words400" dataset using the BLEU score. The evaluation results are as follows:
54
+
55
+ - **Average BLEU Score**: 0.35
56
+ This score indicates the model's ability to generate captions that closely match the reference descriptions in terms of overlapping n-grams.
57
+
58
+ ## Limitations
59
+ - **Dataset Size**: The model was fine-tuned on a relatively small subset of the dataset, which may limit its generalization capabilities.
60
+ - **Domain-Specific**: This model was trained on a specific dataset and may not perform as well on images from different domains.
61
+
62
+ ## Contact
63
+ **Mouwiya S. A. Al-Qaisieh**
64
+ mo3awiya@gmail.com