--- license: apache-2.0 pipeline_tag: text-to-image --- # Work / train in progress ⚡️Waifu: efficient high-resolution waifu synthesis waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality. ## Core designs include: (1) [**AuraDiffusion/16ch-vae**](https://huggingface.co/AuraDiffusion/16ch-vae): A fully open source 16ch VAE. Natively trained in fp16. \ (2) [**Linear DiT**](https://github.com/NVlabs/Sana): we use 1.6b DiT transformer with linear attention. \ (3) [**MEXMA-SigLIP**](https://huggingface.co/visheratin/mexma-siglip): MEXMA-SigLIP is a model that combines the [MEXMA](https://huggingface.co/facebook/MEXMA) multilingual text encoder and an image encoder from the [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384) model. This allows us to get a high-performance CLIP model for 80 languages.. \ (4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human, Gpt's) and danbooru tags to accelerate convergence. ## Example ```py import torch from diffusers import DiffusionPipeline from transformers import XLMRobertaTokenizerFast,XLMRobertaModel from diffusers import FlowMatchEulerDiscreteScheduler from diffusers.models import AutoencoderKL from diffusers import SanaTransformer2DModel pipe_id = "AiArtLab/waifu-2b" variant = "fp16" # tokenizer tokenizer = XLMRobertaTokenizerFast.from_pretrained( pipe_id, subfolder="tokenizer" ) # text_encoder text_encoder = XLMRobertaModel.from_pretrained( pipe_id, variant=variant, subfolder="text_encoder", add_pooling_layer=False ).to("cuda") # scheduler scheduler = FlowMatchEulerDiscreteScheduler(shift=1.0) # VAE vae = AutoencoderKL.from_pretrained( pipe_id, variant=variant, subfolder="vae" ).to("cuda") # Transformer transformer = SanaTransformer2DModel.from_pretrained( pipe_id, variant=variant, subfolder="transformer" ).to("cuda") # Pipeline pipeline = DiffusionPipeline.from_pretrained( pipe_id, tokenizer=tokenizer, text_encoder=text_encoder, vae=vae, transformer=transformer, trust_remote_code=True, ).to("cuda") print(pipeline) prompt = 'аниме девушка, waifu, يبتسم جنسيا , sur le fond de la tour Eiffel' generator = torch.Generator(device="cuda").manual_seed(42) image = pipeline( prompt = prompt, negative_prompt = "", generator=generator, )[0] for img in image: img.show() img.save('waifu.png') ``` ![image](./waifu.png) ## How to cite ```bibtex @misc{Waifu, url = {[https://huggingface.co/AiArtLab/waifu-2b](https://huggingface.co/AiArtLab/waifu-2b)}, title = {waifu-2b}, author = {recoilme, muinez, femboysLover} } ```