PseudoTerminal X commited on
Commit
7ead35c
1 Parent(s): 3fb419d

Trained for 0 epochs and 500 steps.

Browse files

Trained with datasets ['text-embeds-pixart-nofilter', 'photo-concept-bucket']
Learning rate 4e-07, batch size 1, and 1 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ base_model: "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - full
10
+
11
+ inference: true
12
+
13
+ ---
14
+
15
+ # pixart-sigma
16
+
17
+ This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS).
18
+
19
+
20
+
21
+ The main validation prompt used during training was:
22
+
23
+ ```
24
+ a cute anime character named toast holding a sign that says SOON, sitting next to a red square on her left side, and a transparent sphere on her right side
25
+ ```
26
+
27
+ ## Validation settings
28
+ - CFG: `6.5`
29
+ - CFG Rescale: `0.7`
30
+ - Steps: `30`
31
+ - Sampler: `ddpm`
32
+ - Seed: `42`
33
+ - Resolutions: `1024x1024,1152x960,896x1152`
34
+
35
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
36
+
37
+
38
+
39
+
40
+ <Gallery />
41
+
42
+ The text encoder **was not** trained.
43
+ You may reuse the base model text encoder for inference.
44
+
45
+
46
+ ## Training settings
47
+
48
+ - Training epochs: 0
49
+ - Training steps: 500
50
+ - Learning rate: 4e-07
51
+ - Effective batch size: 8
52
+ - Micro-batch size: 1
53
+ - Gradient accumulation steps: 1
54
+ - Number of GPUs: 8
55
+ - Prediction type: epsilon
56
+ - Rescaled betas zero SNR: False
57
+ - Optimizer: AdamW, stochastic bf16
58
+ - Precision: Pure BF16
59
+ - Xformers: Not used
60
+
61
+
62
+ ## Datasets
63
+
64
+ ### photo-concept-bucket
65
+ - Repeats: 0
66
+ - Total number of images: ~559160
67
+ - Total number of aspect buckets: 1
68
+ - Resolution: 1.0 megapixels
69
+ - Cropped: True
70
+ - Crop style: center
71
+ - Crop aspect: square
72
+
73
+
74
+ ## Inference
75
+
76
+
77
+ ```python
78
+ import torch
79
+ from diffusers import DiffusionPipeline
80
+
81
+
82
+
83
+ model_id = "pixart-sigma"
84
+ prompt = "a cute anime character named toast holding a sign that says SOON, sitting next to a red square on her left side, and a transparent sphere on her right side"
85
+ negative_prompt = "malformed, disgusting, overexposed, washed-out"
86
+
87
+ pipeline = DiffusionPipeline.from_pretrained(model_id)
88
+ pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
89
+ image = pipeline(
90
+ prompt=prompt,
91
+ negative_prompt='',
92
+ num_inference_steps=30,
93
+ generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
94
+ width=1152,
95
+ height=768,
96
+ guidance_scale=6.5,
97
+ guidance_rescale=0.7,
98
+ ).images[0]
99
+ image.save("output.png", format="PNG")
100
+ ```
101
+
optimizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c85d8a2d1c97fe78d2306ffefdf9dd68c4e5ed997dd0037c69d22b6586efcee
3
+ size 3665677155
random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c8e9a343abcece94a75918726ea0ae31aa7e80c90cc8eb4a3cff5cb7062d3fb
3
+ size 16100
scheduler.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49246353654cfb564d254bc11935f6ff1ba736f5d3d0b93bdad25d62d10b67ac
3
+ size 1000
training_state-photo-concept-bucket.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"global_step": 500, "epoch_step": 500, "epoch": 1, "exhausted_backends": [], "repeats": {}}
transformer/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PixArtTransformer2DModel",
3
+ "_diffusers_version": "0.30.0.dev0",
4
+ "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
5
+ "activation_fn": "gelu-approximate",
6
+ "attention_bias": true,
7
+ "attention_head_dim": 72,
8
+ "attention_type": "default",
9
+ "caption_channels": 4096,
10
+ "cross_attention_dim": 1152,
11
+ "double_self_attention": false,
12
+ "dropout": 0.0,
13
+ "in_channels": 4,
14
+ "interpolation_scale": 2,
15
+ "norm_elementwise_affine": false,
16
+ "norm_eps": 1e-06,
17
+ "norm_num_groups": 32,
18
+ "norm_type": "ada_norm_single",
19
+ "num_attention_heads": 16,
20
+ "num_embeds_ada_norm": 1000,
21
+ "num_layers": 28,
22
+ "num_vector_embeds": null,
23
+ "only_cross_attention": false,
24
+ "out_channels": 8,
25
+ "patch_size": 2,
26
+ "sample_size": 128,
27
+ "upcast_attention": false,
28
+ "use_additional_conditions": false,
29
+ "use_linear_projection": false
30
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17df19bafac045f1612004a7c7f19e5e6d90004e7ca841b9a491fad08e3a797e
3
+ size 1221780352