Akash Network logo

Thank you to the Akash Network for sponsoring this project and providing A100s/H100s for compute! Open In Colab

Predict the prompt used to generate an AI image

This is a fine-tune of Moondream (5/20/24 version), a tiny vision language model created by the amazing vik. It was fine-tuned on 35,000 image-prompt pairs from the Diffusion DB dataset of Stable Diffusion images. It can predict the prompt used to generate an image, to an extent. It can usually get the style right and an artist whose work/subject matter resembles the image. Settings:

  • Batch Size: 16
  • Learning Rate: 5e-5

Thank you to Akash.net for providing A100s that I used in the process of this project and fine-tuning the model.

Colab

Open In Colab

Fine-tuning Script

Based on the code provided by Vik, here is what I used to fine-tune.

Downloads last month
50
Safetensors
Model size
1.87B params
Tensor type
FP16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train gaodrew/moondream-image2prompt-v1