README.md · dataautogpt3/Miniaturus-Potentia at 3e55e7516f3dd931bbb8d54fcc213810d39318c5

metadata

license: cc-by-nc-nd-4.0
pipeline_tag: text-to-image
widget:
  - text: >-
      beautiful 18yo blonde Swiss girl, wearing a fuzzy sweater, snuggled and
      warm in front of a winter fire in a cozy living room in a chalet, high
      res, best quality, 8k, hyper-detailed, intricate, Cannon 85mm, face up,
      perfecteyes
    output:
      url: ComfyUI_00629_.png
  - text: ((hello!)text:1)
    output:
      url: ComfyUI_00641_.png
  - text: >-
      roman soldier, high res, best quality, 8k, hyper-detailed, intricate,
      Cannon 85mm, face up
    output:
      url: ComfyUI_00637_.png
  - text: panther head coming out of smoke, dark, moody, detailed, shadows
    output:
      url: ComfyUI_00623_.png
  - text: >-
      cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image)
      An Oscar winning movie for Best Cinematography a woman in a kimono
      standing on a subway train in Japan Kodak Motion Picture Film Style,
      shallow depth of field, vignette, highly detailed, high budget, bokeh,
      cinemascope, moody, epic, gorgeous, film grain, grainy
    output:
      url: ComfyUI_00617_.png
  - text: >-
      Super Closeup Portrait, action shot, Profoundly dark whiteish meadow,
      glass flowers, Stains, space grunge style, Jeanne d\Arc wearing White
      Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi
      vibe, dirty, noisy, Vintage monk style, very detailed, hd
    output:
      url: ComfyUI_00615_.png
  - text: >-
      Super Closeup Portrait, action shot, Profoundly dark whiteish meadow,
      glass flowers, Stains, space grunge style, Jeanne d\Arc wearing White
      Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi
      vibe, dirty, noisy, Vintage monk style, very detailed, hd
    parameters:
      negative_prompt: >
        bad quality, bad anatomy, worst quality, low quality, low resolution,
        extra fingers, blur, blurry, ugly, wrong proportions, watermark, image
        artifacts, lowres, ugly, jpeg artifacts, deformed, noisy image,
        embedding:ac_neg1,
    output:
      url: ComfyUI_00614_.png

Prompt
beautiful 18yo blonde Swiss girl, wearing a fuzzy sweater, snuggled and warm in front of a winter fire in a cozy living room in a chalet, high res, best quality, 8k, hyper-detailed, intricate, Cannon 85mm, face up, perfecteyes

Prompt
roman soldier, high res, best quality, 8k, hyper-detailed, intricate, Cannon 85mm, face up

Prompt
panther head coming out of smoke, dark, moody, detailed, shadows

Prompt
cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An Oscar winning movie for Best Cinematography a woman in a kimono standing on a subway train in Japan Kodak Motion Picture Film Style, shallow depth of field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy

Prompt
Super Closeup Portrait, action shot, Profoundly dark whiteish meadow, glass flowers, Stains, space grunge style, Jeanne d\Arc wearing White Olive green used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, noisy, Vintage monk style, very detailed, hd

iteration of Stable Diffusion 1.5, modestly adapted for more refined generation of human figures, hands, and text. The training, while not groundbreaking, was conducted on a reasonable setup of four NVIDIA 3090 GPUs and spanned a modest 16 hours for 8 epochs.

Its capabilities are somewhat specialized, being more adept at creating images of people and textual elements, and less so with animals. This selective improvement makes it a suitable, though not exceptional, tool for tasks requiring detailed human figures or textual accuracy.

The training process incorporated a set of 13,100 unique examples, leading to a dataset of 131,000 images. Each epoch dealt with 31,000 examples, and the model was trained with a batch size of 40. The optimization steps totaled 26,200, with a consistent gradient accumulation, emphasizing gradual and steady learning.

The improvements, while not radical, aim to address common issues in image generation such as blurriness and disproportion. The goal was to achieve clearer, more anatomically coherent results, although the advancements are more evolutionary than revolutionary.