|
--- |
|
license: mit |
|
--- |
|
|
|
<h1 style="font-size: 2em; text-align: center; font-weight: bold; color: #FF69B4; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.2); font-family: 'Arial', sans-serif;"> |
|
Momo XL - Anime-Style SDXL Base Model |
|
</h1> |
|
|
|
|
|
|
|
|
|
<style> |
|
.gallery { |
|
display: flex; |
|
flex-wrap: wrap; |
|
justify-content: center; |
|
} |
|
|
|
.gallery img { |
|
width: 30%; |
|
margin: 1%; |
|
} |
|
|
|
.gallery img.wide { |
|
width: 45%; |
|
} |
|
|
|
</style> |
|
|
|
<div class="gallery"> |
|
<img src="./card_images/01.png" alt="Sample Image 1"> |
|
<img src="./card_images/02.png" alt="Sample Image 2"> |
|
<img src="./card_images/03.png" alt="Sample Image 3"> |
|
<img src="./card_images/04.png" alt="Sample Image 4"> |
|
<img src="./card_images/05.png" alt="Sample Image 5"> |
|
<img src="./card_images/06.png" alt="Sample Image 6"> |
|
<img src="./card_images/07.png" alt="Sample Image 7"> |
|
<img src="./card_images/08.png" alt="Sample Image 8"> |
|
<img src="./card_images/09.png" alt="Sample Image 9"> |
|
<img src="./card_images/10.png" class="wide" alt="Sample Image 10"> |
|
<img src="./card_images/11.png" class="wide" alt="Sample Image 11"> |
|
</div> |
|
|
|
**Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024) |
|
|
|
## Key Features: |
|
|
|
- **Anime-Focused SDXL**: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts. |
|
- **Optimized for Tag-Based Prompting**: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs. |
|
- **LoRA Compatible**: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer. |
|
|
|
## Usage Instructions: |
|
|
|
- **Tagging**: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs. |
|
- **Year-Specific Styles**: To emulate art styles from a specific year, use the tag format "**`year 20XX`**" (e.g., "**`year 2023`**"). |
|
- **LoRA Models**: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects. |
|
|
|
## Disclaimer: |
|
|
|
This model may produce unexpected or unintended results. **Use with caution and at your own risk.** |
|
|
|
**Important Notice:** |
|
|
|
- **Ethical Use**: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations. |
|
- **Content Responsibility**: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material. |
|
- **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain. |
|
|
|
Thank you! 😊 |
|
|
|
|
|
------------------------------------------------------ |
|
## Momo XL - Training Details (Oct 15, 2024) |
|
|
|
### Dataset |
|
Momo XL was trained using a dataset of over **400,000+ images** sourced from Danbooru. |
|
|
|
### Base Model |
|
Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models: |
|
- Formula: |
|
`SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5` |
|
|
|
For more details: |
|
- [Animagine 3.0 base](https://huggingface.co/Linaqruf/animagine-xl-3.0) |
|
- [Pony V6](https://huggingface.co/LyliaEngine/Pony_Diffusion_V6_XL) |
|
|
|
### Training Process |
|
Training was conducted on **A100 80GB GPUs**, totaling over **2000+ GPU hours**. The training was divided into three stages: |
|
- **Finetuning - First Stage**: Trained on the entire dataset with a defined set of training configurations. |
|
- **Finetuning - Second Stage**: Also trained on the entire dataset with some variations in settings. |
|
- **Adjustment Stage**: Focused on aesthetic adjustments to improve the overall visual quality. |
|
|
|
The final model, **Momo XL**, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage. |
|
|
|
### Hyperparameters |
|
|
|
| Stage | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer | LR Scheduler | |
|
|--------------------------|--------|---------|-----------------|------------|------------|--------------|------------|--------------| |
|
| **Finetuning 1st Stage** | 10 | 2e-5 | 1e-5 | 256 | 1024² | N/A | AdamW8bit | Constant | |
|
| **Finetuning 2nd Stage** | 10 | 2e-5 | 1e-5 | 256 | Max. 1280² | N/A | AdamW | Constant | |
|
| **Adjustment Stage** | 0.25 | 8e-5 | 4e-5 | 1024 | Max. 1280² | 0.05 | AdamW | Constant | |
|
|