|
--- |
|
license: apache-2.0 |
|
tags: |
|
- image-captioning |
|
languages: |
|
- en |
|
pipeline_tag: image-to-text |
|
datasets: |
|
- michelecafagna26/hl |
|
language: |
|
- en |
|
metrics: |
|
- sacrebleu |
|
- rouge |
|
library_name: transformers |
|
--- |
|
## GIT-base fine-tuned for Image Captioning on High-Level descriptions of Rationales |
|
|
|
[GIT](https://arxiv.org/abs/2205.14100) base trained on the [HL dataset](https://huggingface.co/datasets/michelecafagna26/hl) for **rationale generation of images** |
|
|
|
## Model fine-tuning ποΈβ |
|
|
|
- Trained for of 10 |
|
- lr: 5eβ5 |
|
- Adam optimizer |
|
. half-precision (fp16) |
|
|
|
## Test set metrics π§Ύ |
|
|
|
| Cider | SacreBLEU | Rouge-L| |
|
|--------|------------|--------| |
|
| 42.58 | 5.9 | 18.55 | |
|
|
|
## Model in Action π |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
|
|
processor = AutoProcessor.from_pretrained("git-base-captioning-ft-hl-rationales") |
|
model = AutoModelForCausalLM.from_pretrained("git-base-captioning-ft-hl-rationales").to("cuda") |
|
|
|
img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg' |
|
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') |
|
|
|
|
|
inputs = processor(raw_image, return_tensors="pt").to("cuda") |
|
pixel_values = inputs.pixel_values |
|
|
|
generated_ids = model.generate(pixel_values=pixel_values, max_length=50, |
|
do_sample=True, |
|
top_k=120, |
|
top_p=0.9, |
|
early_stopping=True, |
|
num_return_sequences=1) |
|
|
|
processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
|
>>> "she is enjoying the sunny day." |
|
``` |
|
|
|
## BibTex and citation info |
|
|
|
```BibTeX |
|
@inproceedings{cafagna2023hl, |
|
title={{HL} {D}ataset: {V}isually-grounded {D}escription of {S}cenes, {A}ctions and |
|
{R}ationales}, |
|
author={Cafagna, Michele and van Deemter, Kees and Gatt, Albert}, |
|
booktitle={Proceedings of the 16th International Natural Language Generation Conference (INLG'23)}, |
|
address = {Prague, Czech Republic}, |
|
year={2023} |
|
} |
|
``` |