michelecafagna26
/

git-base-captioning-ft-hl-actions

image-text-to-text

image-captioning

Inference Endpoints

Model card Files Files and versions Community

git-base-captioning-ft-hl-actions / README.md

michelecafagna26's picture

michelecafagna26

Update README.md

2744151 over 1 year ago

|

1.66 kB

	---
	license: apache-2.0
	tags:
	- image-captioning
	languages:
	- en
	pipeline_tag: image-to-text
	datasets:
	- michelecafagna26/hl
	language:
	- en
	metrics:
	- sacrebleu
	- rouge
	library_name: transformers
	---
	## GIT-base fine-tuned for Image Captioning on High-Level descriptions of Actions

	[GIT](https://arxiv.org/abs/2205.14100) base trained on the [HL dataset](https://huggingface.co/datasets/michelecafagna26/hl) for action generation of images

	## Model fine-tuning 🏋️‍

	- Trained for 10 epochs
	- lr: 5e−5
	- Adam optimizer
	- half-precision (fp16)

	## Test set metrics 🧾

	\| Cider \| SacreBLEU \| Rouge-L\|
	\|--------\|------------\|--------\|
	\| 110.63 \| 15.21 \| 30.45 \|

	## Model in Action 🚀

	```python
	import requests
	from PIL import Image
	from transformers import AutoProcessor, AutoModelForCausalLM

	processor = AutoProcessor.from_pretrained("git-base-captioning-ft-hl-actions")
	model = AutoModelForCausalLM.from_pretrained("git-base-captioning-ft-hl-actions").to("cuda")

	img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg'
	raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')


	inputs = processor(raw_image, return_tensors="pt").to("cuda")
	pixel_values = inputs.pixel_values

	generated_ids = model.generate(pixel_values=pixel_values, max_length=50,
	do_sample=True,
	top_k=120,
	top_p=0.9,
	early_stopping=True,
	num_return_sequences=1)

	processor.batch_decode(generated_ids, skip_special_tokens=True)

	>>> "she is holding an umbrella."
	```

	## BibTex and citation info

	```BibTeX
	```