File size: 4,397 Bytes
73823fc
 
 
442ca71
de5387c
442ca71
 
 
 
 
de5387c
442ca71
de5387c
442ca71
de5387c
442ca71
de5387c
442ca71
 
de5387c
442ca71
 
 
 
 
 
 
 
 
 
 
 
 
 
de5387c
 
 
442ca71
 
 
 
 
de5387c
 
 
442ca71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de5387c
 
 
 
442ca71
 
 
 
 
de5387c
442ca71
de5387c
442ca71
de5387c
442ca71
 
 
 
de5387c
 
 
442ca71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: openrail++
---

# Terminus XL Otaku (v1 preview)

## Model Details

### Model Description

Terminus XL Otaku is a latent diffusion model that uses zero-terminal SNR noise schedule and velocity prediction objective at training and inference time.

Terminus is a new state-of-the-art model family based on SDXL's architecture, and is compatible with (most) SDXL pipelines.

For Terminus Otaku (this model), the training data is exclusively anime/celshading/3D renders and other hand-drawn or synthetic art styles.

The objective of this model was to continue the use of v-prediction objective and min-SNR gamma loss to adapt Terminus Gamma v2's outputs to a more artistic style.


- **Fine-tuned from:** ptx0/terminus-xl-gamma-v2
- **Developed by:** pseudoterminal X (@bghira)
- **Funded by:** pseudoterminal X (@bghira)
- **Model type:** Latent Diffusion
- **License:** openrail++
- **Architecture:** SDXL

### Model Sources

- **Repository:** https://github.com/bghira/SimpleTuner

## Uses

### Direct Use

Terminus XL Otaku can be used for generating high-quality images given text prompts.

It should particularly excel at inpainting tasks for animated subject matter, where a zero-terminal SNR noise schedule allows it to more effectively retain contrast.

The model can be utilized in creative industries such as art, advertising, and entertainment to create visually appealing content.

### Downstream Use

Terminus XL Otaku can be fine-tuned for specific tasks such as image super-resolution, style transfer, and more.

However, it's recommended that the v1 preview not be used for fine-tuning until it is fully released, as any structural issues will hopefully be resolved by then.

### Out-of-Scope Use

The model is not designed for tasks outside of image generation. It should not be used to produce harmful content, or deceive others. Please use common sense.

## Bias, Risks, and Limitations

The model might exhibit biases present in the training data. The generated images should be carefully reviewed to ensure they meet ethical and societal standards.

### Recommendations

Users should be cautious of potential biases in the generated images and thoroughly review them before use.

## Training Details

### Training Data

This model's success largely depended on a somewhat small collection of very high quality data samples.

* Indiscriminate use of NijiJourney outputs.
* Midjourney 5.2 outputs that mention anime styles in their tags.
* Niji and MJ Showcase images that were re-captioned using CogVLM.
* Anchor data of real human subjects in a small (10%) ratio to the animated material, to retain coherence.

### Training Procedure

#### Preprocessing

This model is (so far) trained exclusively on cropped images using SDXL's crop coordinates to improve fine details.

No images were upsampled or downsampled during this training session. Instead, random crops (or unaltered 1024px square images) were used in lieu.

~50,000 images were used for this training run with continuous collection throughout the process, making it difficult to ascertain how many exact images were used.

#### Training Hyperparameters

- **Training regime:** bf16 mixed precision
- **Learning rate:** \(1 \times 10^{-7}\) to \(1 \times 10^{-6}\), cosine schedule
- **Epochs:** 11
- **Batch size:** 12 * 8 = 96

#### Speeds, Sizes, Times

[More Information Needed]

## Evaluation

### Testing Data, Factors & Metrics

[More Information Needed]

### Results

[More Information Needed]

#### Summary

[More Information Needed]

## Environmental Impact

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications

### Model Architecture and Objective

The model uses an SDXL-compatible latent diffusion architecture with a unique min-SNR augmented velocity objective.

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary

[More Information Needed]

## More Information

[More Information Needed]

## Model Card Authors

[More Information Needed]

## Model Card Contact

[More Information Needed]