Image-Text-to-Text
Transformers
TensorBoard
Safetensors
vision-encoder-decoder
Generated from Trainer
Instructions to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Image-Captioning-ML/Vit-GPT2-UCA-UCF-04")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Image-Captioning-ML/Vit-GPT2-UCA-UCF-04") model = AutoModelForMultimodalLM.from_pretrained("Image-Captioning-ML/Vit-GPT2-UCA-UCF-04") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Image-Captioning-ML/Vit-GPT2-UCA-UCF-04
- SGLang
How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Image-Captioning-ML/Vit-GPT2-UCA-UCF-04", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Image-Captioning-ML/Vit-GPT2-UCA-UCF-04 with Docker Model Runner:
docker model run hf.co/Image-Captioning-ML/Vit-GPT2-UCA-UCF-04
Vit-GPT2-UCA-UCF-04
This model is a fine-tuned version of NourFakih/Vit-GPT2-COCO2017Flickr-85k-09 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5019
- Rouge1: 26.9427
- Rouge2: 7.5369
- Rougel: 22.8442
- Rougelsum: 23.3737
- Gen Len: 15.8150
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 18
Training results
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
|---|---|---|---|---|---|---|---|---|
| 0.1528 | 1.1710 | 500 | 0.2809 | 26.7658 | 7.6288 | 22.7989 | 23.3636 | 15.8422 |
| 0.0701 | 2.3419 | 1000 | 0.3340 | 26.0581 | 6.9431 | 22.1094 | 22.5715 | 15.7842 |
| 0.0394 | 3.5129 | 1500 | 0.3520 | 25.9985 | 7.2471 | 22.2292 | 22.6947 | 15.9556 |
| 0.024 | 4.6838 | 2000 | 0.3995 | 26.9298 | 8.4206 | 22.9054 | 23.3302 | 15.0592 |
| 0.0144 | 5.8548 | 2500 | 0.4325 | 25.4177 | 6.9663 | 21.3437 | 21.7631 | 14.6091 |
| 0.0104 | 7.0258 | 3000 | 0.4389 | 26.6544 | 7.3818 | 22.81 | 23.0804 | 15.4291 |
| 0.0067 | 8.1967 | 3500 | 0.4620 | 26.6154 | 7.5924 | 22.7463 | 23.0765 | 15.6745 |
| 0.005 | 9.3677 | 4000 | 0.4657 | 27.7378 | 7.6741 | 23.69 | 24.1869 | 15.6424 |
| 0.0037 | 10.5386 | 4500 | 0.4729 | 27.5305 | 7.6016 | 23.2043 | 23.6397 | 16.7053 |
| 0.0069 | 11.7096 | 5000 | 0.4756 | 27.5112 | 7.8019 | 23.6743 | 24.2136 | 15.3255 |
| 0.0027 | 12.8806 | 5500 | 0.4899 | 26.6969 | 7.4515 | 22.8885 | 23.2666 | 15.2996 |
| 0.0024 | 14.0515 | 6000 | 0.4887 | 26.5269 | 7.1568 | 22.6349 | 23.0376 | 15.8138 |
| 0.0018 | 15.2225 | 6500 | 0.4937 | 26.9342 | 7.3399 | 23.1986 | 23.6486 | 15.3317 |
| 0.0016 | 16.3934 | 7000 | 0.5019 | 27.1042 | 7.4545 | 23.1031 | 23.5834 | 15.6942 |
| 0.0014 | 17.5644 | 7500 | 0.5019 | 26.9427 | 7.5369 | 22.8442 | 23.3737 | 15.8150 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.3.1
- Tokenizers 0.21.0
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Image-Captioning-ML/Vit-GPT2-UCA-UCF-04
Unable to build the model tree, the base model loops to the model itself. Learn more.