Text-to-image finetuning - jffacevedo/pxla_trained_model
This pipeline was finetuned from stabilityai/stable-diffusion-2-base on the lambdalabs/naruto-blip-captions dataset.
Pipeline usage
You can use the pipeline like so:
import torch
import os
import sys
import numpy as np
import torch_xla.core.xla_model as xm
from time import time
from typing import Tuple
from diffusers import StableDiffusionPipeline
def main(args):
device = xm.xla_device()
model_path = <output_dir>
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.bfloat16
)
pipe.to(device)
prompt = ["A naruto with green eyes and red legs."]
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("naruto.png")
if __name__ == '__main__':
main()
Training info
These are the key hyperparameters used during training:
- Steps: 50
- Learning rate: 1e-06
- Batch size: 32
- Image resolution: 512
- Mixed-precision: bf16
Intended uses & limitations
How to use
# TODO: add an example code snippet for running this diffusion pipeline
Limitations and bias
[TODO: provide examples of latent issues and potential remediations]
Training details
[TODO: describe the data used to train the model]
- Downloads last month
- 38
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for jffacevedo/pxla_trained_model
Base model
stabilityai/stable-diffusion-2-base