gokay aydogan PRO

gokaygokay

AI & ML interests

Vision Language Models

Organizations

Posts 2

view post
Post
921
I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!
view post
Post
5427
I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.

This one gives you longer captions.

gokaygokay/SD3-Long-Captioner

This one gives you middle size captions.

gokaygokay/SD3-Long-Captioner-V2

And this one gives you shorter captions.

gokaygokay/SDXL-Captioner