license: unlicense
datasets:
- poloclub/diffusiondb
language:
- en
metrics:
- wer
pipeline_tag: image-to-text
Untitled7-colab_checkpoint
This model was lovingly named after the Google Colab notebook that made it. It is a finetune of Microsoft's git-large-coco model on the 1k subset of poloclub/diffusiondb.
It is supposed to read images and extract a stable diffusion prompt from it but, it might not do a good job at it. I wouldn't know I haven't extensivly tested it.
As the title suggests this is a checkpoint as I formerly intended to do it on the entire dataset but, I'm unsure if I want to now...
This is my first public model so please be nice!
Intended use
Fun!
# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("SE6446/Untitled7-colab_checkpoint")
model = AutoModelForCausalLM.from_pretrained("SE6446/Untitled7-colab_checkpoint")
#################################################################
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-to-text", model="SE6446/Untitled7-colab_checkpoint")
Out-of-scope use
Don't use this model to discriminate, alienate or in any other way harm/harass individuals. You guys know the drill...
Bias, Risks and, Limitations
This model does not produce accurate prompts, this is merely a bit of fun (and waste of funds). However it can suffer from bias present in the orginal git-large-coco model.
Training
I.e boring stuff
- lr = 5e-5
- epochs = 150
- optim = adamw
- fp16
If you want to further finetune it then you should freeze the embedding and vision tranformer layers