|
--- |
|
library_name: transformers |
|
tags: |
|
- multimodal |
|
- vision |
|
- aiart |
|
- midjourney |
|
- dalle |
|
- moondream |
|
- stablediffusion |
|
license: mit |
|
datasets: |
|
- rbeauchamp/diffusion_db_dedupe_from50k_train |
|
language: |
|
- en |
|
--- |
|
|
|
<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="200"/> |
|
|
|
Thank you to the [Akash Network](https://akash.network/) for sponsoring this project and providing A100s/H100s for compute! |
|
<a target="_blank" href="https://colab.research.google.com/github/andrewgcodes/autoprompter/blob/main/run_prompt_predictor.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
|
|
# Predict the prompt used to generate an AI image |
|
This is a fine-tune of [Moondream](https://moondream.ai/) (5/20/24 version), a tiny vision language model created by the amazing [vik](https://x.com/vikhyatk). |
|
It was fine-tuned on 35,000 image-prompt pairs from the Diffusion DB dataset of Stable Diffusion images. |
|
It can predict the prompt used to generate an image, to an extent. It can usually get the style right and an artist whose work/subject matter resembles the image. |
|
Settings: |
|
|
|
- Batch Size: 16 |
|
- Learning Rate: 5e-5 |
|
|
|
Thank you to Akash.net for providing A100s that I used in the process of this project and fine-tuning the model. |
|
|
|
# Colab |
|
<a target="_blank" href="https://colab.research.google.com/github/andrewgcodes/autoprompter/blob/main/run_prompt_predictor.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
# Fine-tuning Script |
|
Based on the code provided by Vik, [here](https://github.com/andrewgcodes/autoprompter/blob/main/finetunemoondream.py) is what I used to fine-tune. |