sana 1.6b finetuned on aesthetic images
base vs finetune: https://imgsli.com/MzI0NTYy
Train https://github.com/Muinez/Sana
Inference: https://github.com/NVlabs/Sana
Example
import torch
from app.sana_pipeline import SanaPipeline
from torchvision.utils import save_image
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)
sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024_AdamW.yaml")
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
from PIL import Image
import numpy as np
import gc
def norm_ip(img, low, high):
img.clamp_(min=low, max=high)
img.sub_(low).div_(max(high - low, 1e-5))
return img
@torch.inference_mode()
def txt2img(prompts):
negative_prompt = "bad anatomy, extra limbs, low quality"
images = []
images.clear()
for i, prompt in enumerate(prompts):
with torch.no_grad():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)#int(time.time()))
images_np = sana(
prompt=prompt,
negative_prompt = negative_prompt,
height=1280,
width=960,
guidance_scale=4.5,
pag_guidance_scale=1.5,
num_inference_steps=24,
generator=generator,
num_images_per_prompt=1,
)
images_norm = [
Image.fromarray(
norm_ip(img, -1, 1)
.mul(255)
.add_(0.5)
.clamp_(0, 255)
.permute(1, 2, 0)
.to("cpu", torch.uint8)
.numpy()
.astype(np.uint8)
)
for img in images_np
]
images += images_norm
#torch.cuda.empty_cache()
gc.collect()
return images
from diffusers.utils import make_image_grid
prompts = [
"A beautiful young woman with short hair, wearing tattered clothing revealing glimpses of her skin beneath the rips and tears; soft lighting highlights both beauty features like piercing eyes or full lips while also emphasizing vulnerability",
"A beautiful young girl with long flowing hair sitting gracefully in a traditional Japanese kimono adorned intricate patterns, her serene expression radiating elegance against an ethereal backdrop of cherry blossoms blooming under the soft glow at sunset",
"A mysterious encounter: a beautiful 20-something girl dressed elegantly but haunting-ly among ruined classical architecture is confrontational stance facing an awe-inspiringaurora-luminescent beast that seems poised for attack or affection",
"An elegant cyberpunk female character with neon-lit eyes in urban night setting under futuristic skyscrapers; wearing form-fitting armor adorned by sleek lines of glowing blue circuits along her body.",
"Charismatic, rotund frog tavern keeper in a lush swamp, Frothy amber beer in webbed hands, Dressed in rustic vest and breeches, Surroundings teeming with life, Realistic and fantasy style",
"Two young women standing together in a park setting during golden hour sunlight with lush greenery, wearing casual summer clothing like sundresses - one blonde woman has long hair and the other brunette wears glasses",
"A serene, golden hour seascape features turquoise waves lapping on white sands framed by distant cliffs as an ethereal young woman in ivory lace dress strolls barefoot alongshore with her magnificent pure-white steed. ",
"portrait of Egor Letov in his signature style - a rugged, intense gaze with piercing eyes set against weathered skin; unruly dark hair framing the face and partially obscuring one eye as if hiding something rebellious underneath its shadowing strands",
"A young woman poses for a self-portrait (selfie) wearing minimal clothing, showcasing her toned legs. The bedroom is cluttered with signs she had been partying the previous evening",
"Nina Dobrev in her 20’ as an e girl with youthful beauty accentuated by the outfit: oversized wet checkered shirt, black leather boots & red floral headscarf framing delicate features while exposing eyes and lips against lush greenery bathed golden sunlight",
"ink splashes, drips, surreal, a man standing on a dock in swampy wetlands, at a distance, moonlit, lonely",
"Four strong movers in matching uniforms stand proudly outside charming farmhouse holding packages for a new homeowner's arrival; they project confidence yet approachability.",
"A young girl with flowing hair sits astride a large green frog in the midst of heavy rain falling from an overcast sky painted across canvas using rich oil paints and loose brushstrokes that capture both motion. The scene is set outdoors on lush grass ",
"A crucified angel amidst a throng of inquisitors surrounding him; close-up detail focusing intensely upon the anguish and horror etched across their faces as they witness his crucifixion. The scene is dark with dramatic lighting emphasizing shadows ",
"A massive crowd of one thousand people fills Red Square in Moscow's heart; their faces are flushed with anger and excitement as they raise fists high above heads. Chanting slogans into the camera lens held aloft by a lone journalist amidst them all ",
"A beautiful young woman with flowing long hair in a braid, wearing tattered clothing revealing glimpses of her skin beneath the rips and tears; soft lighting highlights both beauty features like piercing eyes or full lips while also emphasizing vulnerability",
]
images= txt2img(prompts)
grid = make_image_grid(images, rows=4, cols=4)
#grid.show()
grid.save("grid1.png")