Pokemon Stable Diffusion v1.5
A fine-tuned version of Stable Diffusion v1.5 specifically trained to generate high-quality Pokémon images in various artistic styles.
Model Details
- Base Model: Stable Diffusion v1.5
- Developed by: RekklesAI
- Model Type: Latent Diffusion Model for Text-to-Image generation
- Language(s): English
- License: CreativeML OpenRAIL-M
- Training Data: reach-vb/pokemon-blip-captions
- Training Steps: 15,000 steps at resolution 512x512
- Model Architecture: Same as Stable Diffusion v1.5 (UNet with cross-attention layers)
- Diffusers Version: 0.33.0.dev0
- Scheduler: PNDMScheduler
- Safety Checker: StableDiffusionSafetyChecker (can be disabled during inference)
Model Description
This model is a fine-tuned version of Stable Diffusion v1.5, specifically trained to generate high-quality Pokémon images. It can produce Pokémon in various artistic styles, from photorealistic renders to cartoon styles, cyberpunk aesthetics to watercolor art.
The model was fine-tuned for 15,000 steps on the reach-vb/pokemon-blip-captions dataset, allowing it to learn the distinctive features and characteristics of different Pokémon species while maintaining the generative capabilities of the base model.
Training Details
The model was trained using the following configuration:
accelerate launch train_text_to_image.py \
--pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5" \
--dataset_name="reach-vb/pokemon-blip-captions" \
--caption_column="text" \
--image_column="image" \
--resolution=512 \
--random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \
--report_to tensorboard
Key training parameters:
- Learning rate: 1e-5
- Optimizer: AdamW (default)
- LR scheduler: Constant (no decay)
- Batch size: 1 with gradient accumulation steps of 4
- Resolution: 512x512
- Data augmentation: Random horizontal flip
Sample Images
Below are sample images generated with this model:
Epic Photorealistic Style
Prompt: A majestic Charizard with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient volcanic landscape. Intricate details of ember particles floating around its powerful wings, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its determined expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.
Prompt: A majestic Rayquaza with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient cloud kingdom landscape. Intricate details of cosmic energy particles flowing along its serpentine body, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its ancient and wise expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.
Cyberpunk Style
Prompt: A cybernetic Mewtwo floating in a neon-drenched futuristic cityscape at night. Bioluminescent purple energy coursing through transparent tubes connected to its body. Holographic interfaces surrounding it, reflecting on wet asphalt streets. Cyberpunk aesthetic with glowing technological implants, sharp contrasts between shadows and vibrant neon lights. Blade Runner inspired atmosphere, digital distortion effects, lens flares, and electric particle effects. Ultramodern sci-fi concept art with intricate mechanical details.
Gothic Style
Prompt: A haunting Gengar lurking in a decrepit Victorian mansion. High contrast black and white photography style with dramatic chiaroscuro lighting. Gothic architecture with ornate details fading into deep shadows. Film noir aesthetic with grainy texture and vignette edges. Eerie atmosphere enhanced by fog tendrils and moonlight streaming through broken stained glass windows. Reminiscent of classic horror cinema, with stark silhouettes and ominous negative space. Timeless monochromatic art with haunting emotional depth.
Watercolor Art Style
Prompt: A serene Gardevoir in an enchanted forest glade, surrounded by luminescent butterflies and delicate wildflowers. Soft watercolor style with gentle pastel hues, flowing brushstrokes creating an ethereal atmosphere. Dappled sunlight filtering through the canopy, creating a dreamy bokeh effect. Impressionistic details, emotional color palette with teal and lavender accents, artistic composition inspired by Studio Ghibli, whimsical fantasy illustration.
Kawaii/Chibi Style
Prompt: An adorable chibi-style Eevee and its evolutions having a tea party in a candy-colored meadow. Kawaii anime style with exaggerated expressions and oversized eyes. Pastel rainbow palette with soft shading and cute decorative elements like hearts and stars. Playful composition with rounded shapes and simplified forms. Cheerful atmosphere with cartoon sparkles and emotion symbols. Inspired by children's animation, with clean outlines and flat color blocks. Whimsical and heartwarming illustration style perfect for merchandise.
Prompt: A delightful tea party hosted by Bulbasaur, Chikorita, and Rowlet in a blooming flower garden. Cute storybook illustration style with soft rounded shapes. Tiny teacups and miniature pastries served on lily pad tables. Pastel green and pink color scheme with dainty flower patterns. Chibi proportions with oversized heads and stubby limbs. Cheerful expressions with sparkling eyes and happy smiles. Whimsical details like butterfly waiters and ladybug guests. Heartwarming scene rendered in a children's picture book style with gentle outlines and soft textures.
Usage
You can use this model with the Diffusers library:
import torch
from diffusers import StableDiffusionPipeline
# Load the model
model_path = "path/to/PokemonStable-v1-5" # Replace with actual path
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
safety_checker=None # Set to None to disable safety checker
)
# Move to GPU if available
if torch.cuda.is_available():
pipe = pipe.to("cuda")
# Generate image
prompt = "A cute Pikachu playing in a grassy field, high resolution, detailed"
image = pipe(
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5
).images[0]
# Save image
image.save("generated_pokemon.png")
Advanced Usage with Custom Scheduler
You can also customize the scheduler for different generation qualities:
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
model_path = "path/to/PokemonStable-v1-5" # Replace with actual path
# Load model
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
safety_checker=None
)
# Replace scheduler for faster inference
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config,
algorithm_type="dpmsolver++",
solver_order=2
)
# Move to GPU
pipe = pipe.to("cuda")
# Generate with fewer steps
prompt = "A majestic Charizard in battle stance, fire breathing, detailed scales, epic lighting"
image = pipe(
prompt=prompt,
num_inference_steps=25, # Fewer steps needed with DPM-Solver++
guidance_scale=7.5
).images[0]
image.save("charizard_dpm_solver.png")
Prompt Engineering Tips
For optimal results, consider the following prompt engineering techniques:
- Specify Pokémon Names: Include specific Pokémon names like "Pikachu", "Bulbasaur", "Charizard", etc.
- Add Scene Descriptions: Describe the environment, such as "in a forest", "in battle", "sleeping", etc.
- Include Style Descriptors: Add style terms like "high resolution", "detailed", "cartoon style", etc.
- Emotional Context: Include emotional states like "happy", "angry", "cute", etc.
- Artistic Techniques: Specify art styles like "watercolor", "oil painting", "digital art", etc.
- Lighting and Atmosphere: Describe lighting conditions like "sunset", "moonlight", "studio lighting", etc.
- Composition Guidelines: Add terms like "rule of thirds", "dynamic pose", "close-up shot", etc.
Limitations
- The model may occasionally generate Pokémon with anatomical inaccuracies
- Text rendering within images may be illegible or distorted
- Complex compositions with multiple Pokémon may not always position them correctly
- The model performs best with English prompts
- As with all Stable Diffusion models, it inherits certain biases and limitations from the base model
- The safety checker may occasionally filter legitimate content; it can be disabled but use with caution
Ethical Considerations
This model is intended for creative and artistic purposes only. Users should:
- Respect the intellectual property rights of The Pokémon Company and Nintendo
- Avoid generating harmful, offensive, or inappropriate content
- Not use generated images for commercial purposes without proper licensing
- Be transparent about AI-generated content when sharing
License
This model is based on Stable Diffusion v1.5 and follows the CreativeML OpenRAIL-M license of the original model.
Acknowledgements
Thanks to all artists and creators who have contributed to the Pokémon franchise, and to Stability AI for developing the Stable Diffusion model. Special thanks to the creators of the reach-vb/pokemon-blip-captions dataset used for training this model.
Citation
If you use this model in your research, please cite:
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
- Downloads last month
- 34