toilaluan commited on
Commit
b659c17
1 Parent(s): 04c186f

Trained for 0 epochs and 500 steps.

Browse files

Trained with datasets ['text-embeds', 'mj-v6']
Learning rate 8e-06, batch size 32, and 3 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

.gitattributes CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/image_0_0.png filter=lfs diff=lfs merge=lfs -text
37
  assets/image_1_0.png filter=lfs diff=lfs merge=lfs -text
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/image_0_0.png filter=lfs diff=lfs merge=lfs -text
37
  assets/image_1_0.png filter=lfs diff=lfs merge=lfs -text
38
+ training_state-mj-v6.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ base_model: "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - full
10
+
11
+ inference: true
12
+ widget:
13
+ - text: 'unconditional (blank prompt)'
14
+ parameters:
15
+ negative_prompt: 'blurry, cropped, ugly'
16
+ output:
17
+ url: ./assets/image_0_0.png
18
+ - text: 'ethnographic photography of teddy bear at a picnic'
19
+ parameters:
20
+ negative_prompt: 'blurry, cropped, ugly'
21
+ output:
22
+ url: ./assets/image_1_0.png
23
+ ---
24
+
25
+ # pixart-training
26
+
27
+ This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS).
28
+
29
+
30
+
31
+ The main validation prompt used during training was:
32
+
33
+ ```
34
+ ethnographic photography of teddy bear at a picnic
35
+ ```
36
+
37
+ ## Validation settings
38
+ - CFG: `7.5`
39
+ - CFG Rescale: `0.0`
40
+ - Steps: `30`
41
+ - Sampler: `euler`
42
+ - Seed: `42`
43
+ - Resolution: `1024`
44
+
45
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
46
+
47
+ You can find some example images in the following gallery:
48
+
49
+
50
+ <Gallery />
51
+
52
+ The text encoder **was not** trained.
53
+ You may reuse the base model text encoder for inference.
54
+
55
+
56
+ ## Training settings
57
+
58
+ - Training epochs: 0
59
+ - Training steps: 500
60
+ - Learning rate: 8e-06
61
+ - Effective batch size: 96
62
+ - Micro-batch size: 32
63
+ - Gradient accumulation steps: 3
64
+ - Number of GPUs: 1
65
+ - Prediction type: epsilon
66
+ - Rescaled betas zero SNR: False
67
+ - Optimizer: AdamW, stochastic bf16
68
+ - Precision: Pure BF16
69
+ - Xformers: Enabled
70
+
71
+
72
+ ## Datasets
73
+
74
+ ### mj-v6
75
+ - Repeats: 0
76
+ - Total number of images: 199872
77
+ - Total number of aspect buckets: 1
78
+ - Resolution: 1.0 megapixels
79
+ - Cropped: False
80
+ - Crop style: None
81
+ - Crop aspect: None
82
+
83
+
84
+ ## Inference
85
+
86
+
87
+ ```python
88
+ import torch
89
+ from diffusers import DiffusionPipeline
90
+
91
+
92
+
93
+ model_id = "pixart-training"
94
+ prompt = "ethnographic photography of teddy bear at a picnic"
95
+ negative_prompt = "malformed, disgusting, overexposed, washed-out"
96
+
97
+ pipeline = DiffusionPipeline.from_pretrained(model_id)
98
+ pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
99
+ image = pipeline(
100
+ prompt=prompt,
101
+ negative_prompt='blurry, cropped, ugly',
102
+ num_inference_steps=30,
103
+ generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
104
+ width=1152,
105
+ height=768,
106
+ guidance_scale=7.5,
107
+ guidance_rescale=0.0,
108
+ ).images[0]
109
+ image.save("output.png", format="PNG")
110
+ ```
111
+
optimizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58125cbdeee71875e41dcb0364eca7fb41c0768eee8e8f8c72612c9376012283
3
+ size 3665677155
random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cceeb55a33a3db4f1a295e5aa0a8fcea8f2638c53ec5216a82c7db9b65c4858
3
+ size 14344
scheduler.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57feaeea732a8232dc14923ac8e8cff564f2d6d11728d1405a7f3cfc02efb7ed
3
+ size 1000
training_state-mj-v6.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7828c00d6f87d54210b7888c9040dee97e356126dc1d3916106ee737f452288c
3
+ size 19126435
training_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"global_step": 500, "epoch_step": 500, "epoch": 1, "exhausted_backends": [], "repeats": {}}
transformer/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PixArtTransformer2DModel",
3
+ "_diffusers_version": "0.29.0",
4
+ "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
5
+ "activation_fn": "gelu-approximate",
6
+ "attention_bias": true,
7
+ "attention_head_dim": 72,
8
+ "attention_type": "default",
9
+ "caption_channels": 4096,
10
+ "cross_attention_dim": 1152,
11
+ "double_self_attention": false,
12
+ "dropout": 0.0,
13
+ "in_channels": 4,
14
+ "interpolation_scale": 2,
15
+ "norm_elementwise_affine": false,
16
+ "norm_eps": 1e-06,
17
+ "norm_num_groups": 32,
18
+ "norm_type": "ada_norm_single",
19
+ "num_attention_heads": 16,
20
+ "num_embeds_ada_norm": 1000,
21
+ "num_layers": 28,
22
+ "num_vector_embeds": null,
23
+ "only_cross_attention": false,
24
+ "out_channels": 8,
25
+ "patch_size": 2,
26
+ "sample_size": 128,
27
+ "upcast_attention": false,
28
+ "use_additional_conditions": false,
29
+ "use_linear_projection": false
30
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b106bfee3490f721f128596f246bffc8dc8e9d711ef62f21a1532186ba50e5ad
3
+ size 1221780352