File size: 5,842 Bytes
9d51efb 3f6f519 9d51efb 3f6f519 9d51efb 8ad1d64 9d51efb e4bf77c 9d51efb e4bf77c 9d51efb e4bf77c 9d51efb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# Gallery
<img src="gallery_demo.png" width="2432" height="1440"/>
Animemory Alpha is a bilingual model primarily focused on anime-style image generation. It utilizes a SDXL-type Unet
structure and a self-developed bilingual T5-XXL text encoder, achieving good alignment between Chinese and English. We
first developed our general model using billion-level data and then tuned the anime model through a series of
post-training strategies and curated data. By open-sourcing the Alpha version, we hope to contribute to the development
of the anime community, and we greatly value any feedback.
# Key Features
- Good bilingual prompt following, effectively transforming certain Chinese concepts into anime style.
- The model is mainly にじげん(二次元) style, supporting common artistic styles and Chinese elements.
- Competitive image quality, especially in generating detailed characters and landscapes.
- Prediction mode is x-prediction, so the model tends to produce subjects with cleaner backgrounds; more detailed
prompts can further refine your images.
- Impressive creative ability, the more detailed the descriptions are, the more surprises it can produce.
- Embracing open-source co-construction; we welcome anime fans to join our ecosystem and share your creative ideas
through our workflow.
- Better support for Chinese-style elements.
- Compatible with both tag lists and natural language description-style prompts.
# Model Info
<table>
<tr>
<th>Developed by</th>
<td>animEEEmpire</td>
</tr>
<tr>
<th>Model Name</th>
<td>AniMemory-alpha</td>
</tr>
<tr>
<th>Model type</th>
<td>Diffusion-based text-to-image generative model</td>
</tr>
<tr>
<th>Download link</th>
<td><a href="https://huggingface.co/animEEEmpire/AniMemory-alpha">Hugging Face</a></td>
</tr>
<tr>
<th rowspan="4">Parameter</th>
<td>TextEncoder_1: 5.6B</td>
</tr>
<tr>
<td>TextEncoder_2: 950M</td>
</tr>
<tr>
<td>Unet: 3.1B</td>
</tr>
<tr>
<td>VAE: 271M</td>
</tr>
<tr>
<th>Context Length</th>
<td>227</td>
</tr>
<tr>
<th>Resolution</th>
<td>Multi-resolution</td>
</tr>
</table>
# Key Problems and Notes
- Primarily focuses on text-following ability and basic image quality; it is not a strongly artistic or stylized
version, making it suitable for open-source co-construction.
- Quantization and distillation are still in progress, leaving room for significant speed improvements and GPU memory
savings. We are planning for this and looking forward to volunteers.
- A relatively complete data filtering and cleaning process has been conducted, so it is not adept at pornographic
generation; any attempts to force it may result in image crashes.
- Simple descriptions tend to produce images with simple backgrounds and chibi-style illustrations; you can try to
enhance the detail by providing comprehensive descriptions.
- For close-up shots, please use descriptions like "detailed face", "close-up view" etc. to enhance the impact of the
output.
- Adding necessary quality descriptors can sometimes improve the overall quality.
- The issue with small faces still exists in the Alpha version, but it has been slightly improved; feel free to try it
out.
- It is better to detail a single object rather than too many objects in one prompt.
# Limitations
- Although the model data has undergone extensive cleaning, there may still be potential gender, ethnic, or political
biases.
- The model's open-sourcing is dedicated to enriching the ecosystem of the anime community and benefiting anime fans.
- The usage of the model shall not infringe upon the legal rights and interests of designers and creators.
# Quick Start
1.Install the necessary requirements.
- Recommended Python >= 3.10, PyTorch >= 2.3, CUDA >= 12.1.
- It is recommended to use Anaconda to create a new environment (Python >=
3.10) `conda create -n animemory python=3.10 -y` to run the following example.
- run `pip install git+https://github.com/huggingface/diffusers.git torch==2.3.1 transformers==4.43.0 accelerate==0.31.0 sentencepiece`
2.ComfyUI inference.
Go to [ComfyUI-Animemory-Loader](https://github.com/animEEEmpire/ComfyUI-Animemory-Loader) for comfyui configuration.
3.Diffusers inference.
- The pipeline has not been merged yet. Please use the following code to setup the environment.
```shell
git clone https://github.com/huggingface/diffusers.git
cd ..
git clone https://github.com/animEEEmpire/diffusers_animemory
cp diffusers_animemory/* diffusers -r
# then u can install diffusers or just call it locally.
cd diffusers
pip install .
```
- And then, you can use the following code to generate images.
```python
from diffusers import AniMemoryPipeLine
import torch
pipe = AniMemoryPipeLine.from_pretrained("animEEEmpire/AniMemory-alpha", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁"
negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"
images = pipe(prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=40,
height=1024, width=1024,
guidance_scale=7,
num_images_per_prompt=1
)[0]
images.save("output.png")
```
- Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 14.25 G,
compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s.
17.74s on A100 40G).
4.For faster inference, please refer to our future work.
# License
This repo is released under the Apache 2.0 License.
|