File size: 4,579 Bytes

---
license: mit
---

<h1 style="font-size: 2em; text-align: center; font-weight: bold; color: #FF69B4; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.2); font-family: 'Arial', sans-serif;">
  Momo XL - Anime-Style SDXL Base Model
</h1>




<style>
.gallery {
  display: flex;
  flex-wrap: wrap;
  justify-content: center;
}

.gallery img {
  width: 30%;
  margin: 1%;
}

.gallery img.wide {
  width: 45%;
}

</style>

<div class="gallery">
  <img src="./card_images/01.png" alt="Sample Image 1">
  <img src="./card_images/02.png" alt="Sample Image 2">
  <img src="./card_images/03.png" alt="Sample Image 3">
  <img src="./card_images/04.png" alt="Sample Image 4">
  <img src="./card_images/05.png" alt="Sample Image 5">
  <img src="./card_images/06.png" alt="Sample Image 6">
  <img src="./card_images/07.png" alt="Sample Image 7">
  <img src="./card_images/08.png" alt="Sample Image 8">
  <img src="./card_images/09.png" alt="Sample Image 9">
  <img src="./card_images/10.png" class="wide" alt="Sample Image 10">
  <img src="./card_images/11.png" class="wide" alt="Sample Image 11">
</div>

**Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)

## Key Features:

- **Anime-Focused SDXL**: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts.
- **Optimized for Tag-Based Prompting**: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs.
- **LoRA Compatible**: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer.

## Usage Instructions:

- **Tagging**: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs.
- **Year-Specific Styles**: To emulate art styles from a specific year, use the tag format "**`year 20XX`**" (e.g., "**`year 2023`**").
- **LoRA Models**: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects.

## Disclaimer:

This model may produce unexpected or unintended results. **Use with caution and at your own risk.**

**Important Notice:**

- **Ethical Use**: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations.
- **Content Responsibility**: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material.
- **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.

Thank you! 😊


------------------------------------------------------
## Momo XL - Training Details (Oct 15, 2024)

### Dataset
Momo XL was trained using a dataset of over **400,000+ images** sourced from Danbooru.

### Base Model
Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:
- Formula:  
  `SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5`

For more details:
- [Animagine 3.0 base](https://huggingface.co/Linaqruf/animagine-xl-3.0)
- [Pony V6](https://huggingface.co/LyliaEngine/Pony_Diffusion_V6_XL)

### Training Process
Training was conducted on **A100 80GB GPUs**, totaling over **2000+ GPU hours**. The training was divided into three stages:
- **Finetuning - First Stage**: Trained on the entire dataset with a defined set of training configurations.
- **Finetuning - Second Stage**: Also trained on the entire dataset with some variations in settings.
- **Adjustment Stage**: Focused on aesthetic adjustments to improve the overall visual quality.

The final model, **Momo XL**, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage. 

### Hyperparameters

| Stage                    | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer  | LR Scheduler |
|--------------------------|--------|---------|-----------------|------------|------------|--------------|------------|--------------|
| **Finetuning 1st Stage**  | 10     | 2e-5    | 1e-5            | 256        | 1024²      | N/A          | AdamW8bit  | Constant     |
| **Finetuning 2nd Stage**  | 10     | 2e-5    | 1e-5            | 256        | Max. 1280² | N/A          | AdamW      | Constant     |
| **Adjustment Stage**      | 0.25   | 8e-5    | 4e-5            | 1024       | Max. 1280² | 0.05         | AdamW      | Constant     |