ptx0
/

pseudo-journey-v2

StableDiffusionPipeline

stable-diffusion

Inference Endpoints

Model card Files Files and versions Community

pseudo-journey-v2 / README.md

ptx0's picture

Update README.md

642d4c7 12 months ago

|

raw history blame contribute delete

1.84 kB

	---
	license: creativeml-openrail-m
	library_name: diffusers
	tags:
	- stable-diffusion
	- text-to-image
	---

	# Capabilities

	This model is "adventure" and "fantasy" focused.

	With certain inference configurations, it is capable of producing very high quality results.

	This model functions better without negative prompts than most fine-tunes.

	# Inference parameters

	Diffusers should "Just Work" with the config in this repository.

	For A1111 users,

	Scheduler: DDIM, 15-50 steps
	Generally acceptable resolutions:
	- 768x768
	- 1024x1024
	- 1152x768

	# Limitations

	This model contains a heavily tuned text encoder that has lost many original Stable Diffusion 2.1 concepts

	This model is even less reliable at producing real people than the base 2.1-v model is

	Training data included only 768x768 downsampled 1:1 ratio images, all other aspects were discarded. Ergo, this model struggles with high resolution native generations.

	This model may have "burnt" outputs at higher CFG.

	# Checkpoints

	This model contains multiple revisions:

	`02b28ff` (latest/main checkpoint)
	30000 steps (approx 4 epochs) with terminal SNR on 22k Midjourney 5.1 images plus 7200 real photographs as balance data with complete BLIP captions on all data. BS=4, LR=4e-7 to 1e-8

	`6d3949c` (retrained from ptx0/pseudo-journey)
	[retrained: based on ptx0/pseudo-journey @ 4000 steps from stable-diffusion-2-1 baseline on 3300 images] + 9500 steps on 22,400 images, polynomial learning rate scheduler, batch size 4, 64 gradient accumulations, FROZEN text encoder, 8bit ADAM, ZERO PLW (no regularization data), followed by 550 steps with unfrozen text encoder and constant LR 1e-8

	`9135a79` (original ckpt test)
	13000 steps: trained from ptx0/pseudo-journey, polynomial learning rate scheduler, batch size 3, text encoder, 8bit ADAM, ZERO PLW (no regularization data)