tags: | |
- image-to-text | |
- image-captioning | |
license: apache-2.0 | |
metrics: | |
- rouge | |
datasets: | |
- nlphuji/flickr30k | |
widget: | |
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg | |
example_title: Savanna | |
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg | |
example_title: Football Match | |
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg | |
example_title: Airport | |
base_model: | |
- google/vit-base-patch16-224-in21k | |
model-index: | |
- name: mozilla/distilvit | |
results: | |
- task: | |
type: image-to-text | |
name: Image To Text | |
dataset: | |
name: nlphuji/flickr30k | |
type: nlphuji/flickr30k | |
metrics: | |
- name: ROUGE-1 | |
type: rouge | |
value: 43.006 | |
verified: true | |
- name: ROUGE-2 | |
type: rouge | |
value: 16.9939 | |
verified: true | |
- name: ROUGE-L | |
type: rouge | |
value: 38.8923 | |
verified: true | |
- name: ROUGE-LSUM | |
type: rouge | |
value: 38.8877 | |
verified: true | |
- name: loss | |
type: loss | |
value: 0.19939416646957397 | |
- name: gen_len | |
type: gen_len | |
value: 11.327256736227712 | |
verified: true | |
# distilvit | |
This model is a work in progress. Fine-tuned version of those base models: | |
- a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k | |
- a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2 | |
This model was trained on: | |
- Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k | |
- COCO 2017: https://cocodataset.org | |
You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit. | |
It was then further fine-tuned on : | |
- Flickr30k debiased: https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions | |
- DocOrNot: https://huggingface.co/datasets/Mozilla/docornot | |
You can find the code used to create the model here: https://github.com/mozilla/distilvit | |
### Framework versions | |
- Transformers 4.40.2 | |
- Pytorch 2.3.0+cu121 | |
- Datasets 2.19.1 | |
- Tokenizers 0.19.1 | |