Generating Scientific Figure Captions with Fine-Tuned PaliGemma
This notebook explores the use of fine-tuned PaliGemma models for generating captions for scientific figures. PaliGemma-FT is research-oriented models that are fine-tuned on specific research datasets. We'll focus on the PaliGemma-3b-ft-scicap-448 model, specifically trained on the SciCap dataset for this task.
The notebook will demonstrate how to utilize this pre-trained model for caption generation, highlighting its potential advantages due to:
- Higher resolution image input (448x448): Capturing finer details in scientific figures.
- SciCap dataset fine-tuning: Focused training on scientific figure captions.
Code available on Colab Notebook
Code available on Kaggle Notebook
Unable to determine this model's library. Check the
docs
.