## Image Caption Generator with ViT and Roberta
This is a repository for an image caption generator model built using Vision Transformer (ViT) and Roberta. The model is trained on the [flickr10k](https://www.kaggle.com/datasets/icode100/flickr-10k) dataset which derived from the [Flickr](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset) dataset consisting of 31k images. RobertaMLM and Byte_tokenizer is removed from this github repository for space constraints.
View on kaggle:
## Contents
- [Models](#models)
- [Features](#features)
- [Dependancies](#dependencies)
- [Usage](#usage)
- [Trainig](#training)
---
**ROUGE2 SCORES**
| Metric | Score |
|---|---|
| Precision | 0.114700 |
| Recall | 0.124500 |
| F1-Score | 0.114100 |
---
### Models
* [RobertaMLM](https://www.kaggle.com/models/icode100/robertamlm) the Roberta model fine-tuned on Masked Lnaguage modeling
* [ViT_Roberta_Image_Captioning](https://www.kaggle.com/models/icode100/vit_roberta_image_captioning) made using VitEncoderDecoder where ViT acts as image encoder and RobertaMLM being the language based decoder
* [The Byte pair tokenizer](https://www.kaggle.com/models/icode100/byte_tokenizer)
### Features
* Generates captions for images based on visual content and learned language patterns.
* Leverages the power of ViT for efficient image encoding and Roberta for robust text generation.
### Dependencies
This project requires the following libraries:
* Transformers
* torch
* Pillow (PIL Fork)
### Usage
* This is the [application](https://vitrobertaimagecaptioning.streamlit.app/) where you can upload image from local file system and the captions will be generated
* For using model in kaggle :
* visit the [kaggle page](https://www.kaggle.com/models/icode100/vit_roberta_image_captioning) and click create notebook.
* Add both the datasets provided in the description.
* The datasets are dependent as flickr10k is derived from Flickr and hence uses the image folder privided in Flickr.
### Training
To clone and create your own model:
> `git install lfs`
> `git clone https://huggingface.co/spaces/icode100/Image_Captioning/tree/main`