Instructions to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Image-Captioning-ML/Vit-GPT2-UCA-UCF-04")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Image-Captioning-ML/Vit-GPT2-UCA-UCF-04")
model = AutoModelForMultimodalLM.from_pretrained("Image-Captioning-ML/Vit-GPT2-UCA-UCF-04")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Image-Captioning-ML/Vit-GPT2-UCA-UCF-04

SGLang

How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with Docker Model Runner:
```
docker model run hf.co/Image-Captioning-ML/Vit-GPT2-UCA-UCF-04
```

Vit-GPT2-UCA-UCF-04

This model is a fine-tuned version of NourFakih/Vit-GPT2-COCO2017Flickr-85k-09 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5019
Rouge1: 26.9427
Rouge2: 7.5369
Rougel: 22.8442
Rougelsum: 23.3737
Gen Len: 15.8150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 18

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.1528	1.1710	500	0.2809	26.7658	7.6288	22.7989	23.3636	15.8422
0.0701	2.3419	1000	0.3340	26.0581	6.9431	22.1094	22.5715	15.7842
0.0394	3.5129	1500	0.3520	25.9985	7.2471	22.2292	22.6947	15.9556
0.024	4.6838	2000	0.3995	26.9298	8.4206	22.9054	23.3302	15.0592
0.0144	5.8548	2500	0.4325	25.4177	6.9663	21.3437	21.7631	14.6091
0.0104	7.0258	3000	0.4389	26.6544	7.3818	22.81	23.0804	15.4291
0.0067	8.1967	3500	0.4620	26.6154	7.5924	22.7463	23.0765	15.6745
0.005	9.3677	4000	0.4657	27.7378	7.6741	23.69	24.1869	15.6424
0.0037	10.5386	4500	0.4729	27.5305	7.6016	23.2043	23.6397	16.7053
0.0069	11.7096	5000	0.4756	27.5112	7.8019	23.6743	24.2136	15.3255
0.0027	12.8806	5500	0.4899	26.6969	7.4515	22.8885	23.2666	15.2996
0.0024	14.0515	6000	0.4887	26.5269	7.1568	22.6349	23.0376	15.8138
0.0018	15.2225	6500	0.4937	26.9342	7.3399	23.1986	23.6486	15.3317
0.0016	16.3934	7000	0.5019	27.1042	7.4545	23.1031	23.5834	15.6942
0.0014	17.5644	7500	0.5019	26.9427	7.5369	22.8442	23.3737	15.8150

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 3.3.1
Tokenizers 0.21.0

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Image-Captioning-ML/Vit-GPT2-UCA-UCF-04

Unable to build the model tree, the base model loops to the model itself. Learn more.