File size: 1,721 Bytes
c03ed1f
 
9486fb1
 
 
 
 
 
 
 
 
 
 
 
 
c03ed1f
 
9486fb1
c03ed1f
9486fb1
91a4c9f
9486fb1
 
c03ed1f
91a4c9f
b0db227
9486fb1
 
b0db227
9486fb1
cec59be
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
library_name: transformers
tags:
- multimodal
- vision
- aiart
- midjourney
- dalle
- moondream
- stablediffusion
license: mit
datasets:
- rbeauchamp/diffusion_db_dedupe_from50k_train
language:
- en
---

<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="200"/>

Thank you to the [Akash Network](https://akash.network/) for sponsoring this project and providing A100s/H100s for compute!
<a target="_blank" href="https://colab.research.google.com/github/andrewgcodes/autoprompter/blob/main/run_prompt_predictor.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


# Predict the prompt used to generate an AI image
This is a fine-tune of [Moondream](https://moondream.ai/) (5/20/24 version), a tiny vision language model created by the amazing [vik](https://x.com/vikhyatk).
It was fine-tuned on 35,000 image-prompt pairs from the Diffusion DB dataset of Stable Diffusion images. 
It can predict the prompt used to generate an image, to an extent. It can usually get the style right and an artist whose work/subject matter resembles the image.
Settings:

- Batch Size: 16
- Learning Rate: 5e-5

Thank you to Akash.net for providing A100s that I used in the process of this project and fine-tuning the model.

# Colab
<a target="_blank" href="https://colab.research.google.com/github/andrewgcodes/autoprompter/blob/main/run_prompt_predictor.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Fine-tuning Script
Based on the code provided by Vik, [here](https://github.com/andrewgcodes/autoprompter/blob/main/finetunemoondream.py) is what I used to fine-tune.