gokay aydogan PRO

gokaygokay

AI & ML interests

OPEN SOURCEEEEEE!

Organizations

gokaygokay's activity

posted an update 8 days ago
posted an update 10 days ago
view post
Post
4705
Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team.

Hugging Face Spaces
- gokaygokay/Kolors

Model Page
- Kwai-Kolors/Kolors
posted an update 12 days ago
view post
Post
3828
I've created a space for chatting with Gemma 2 using llama.cpp

- πŸŽ›οΈ Choose between 27B IT and 9b IT models
- πŸš€ Fast inference using llama.cpp

- gokaygokay/Gemma-2-llamacpp
  • 1 reply
Β·
posted an update 14 days ago
view post
Post
2829
I've created a Stable Diffusion 3 (SD3) image generation space for convenience. Now you can:

1. Generate SD3 prompts from images
2. Enhance your text prompts (turn 1-2 words into full SD3 prompts)

gokaygokay/SD3-with-VLM-and-Prompt-Enhancer

These features are based on my custom models:

- VLM captioner for prompt generation:
- gokaygokay/sd3-long-captioner

- Prompt Enhancers for SD3 Models:
- gokaygokay/Lamini-Prompt-Enchance-Long
- gokaygokay/Lamini-Prompt-Enchance

You can now simplify your SD3 workflow with these tools!
replied to their post 21 days ago
view reply

I think for a 0.22B size model it looks amazing. I saw some very recent 3B even 7B models worst than this and its highly fine-tunable. I've fine tuned it only with 3500 train samples under 15 minutes.

replied to their post 26 days ago
view reply

They've already fine-tuned the base model and it looks that its better at Segmentation and Object Detection with fine-tuned model. But captions are less detailed and short. Maybe thats a good thing about hallucinations but sometimes fine-tuned model gives almost no details. But for your question it looks like a fine-tunable model.

posted an update 28 days ago
view post
Post
5673
I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.

This one gives you longer captions.

gokaygokay/SD3-Long-Captioner

This one gives you middle size captions.

https://huggingface.co/spaces/gokaygokay/SD3-Long-Captioner-V2

And this one gives you shorter captions.

https://huggingface.co/spaces/gokaygokay/SDXL-Captioner

Β·