devancao
commited on
Commit
•
3139399
1
Parent(s):
c1e8039
track files
Browse files- .gitattributes +16 -0
- README.md +136 -3
.gitattributes
CHANGED
@@ -33,3 +33,19 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
tokenizer_2/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
37 |
+
text_encoder/weights.safetensors filter=lfs diff=lfs merge=lfs -text
|
38 |
+
git-lfs filter=lfs diff=lfs merge=lfs -text
|
39 |
+
text_encoder_2/weights.safetensors filter=lfs diff=lfs merge=lfs -text
|
40 |
+
text_encoder_2/model.safetensors filter=lfs diff=lfs merge=lfs -text
|
41 |
+
unet/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
|
42 |
+
lfs filter=lfs diff=lfs merge=lfs -text
|
43 |
+
tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
44 |
+
track filter=lfs diff=lfs merge=lfs -text
|
45 |
+
text_encoder/model.safetensors filter=lfs diff=lfs merge=lfs -text
|
46 |
+
vae/movq_model.safetensors filter=lfs diff=lfs merge=lfs -text
|
47 |
+
git filter=lfs diff=lfs merge=lfs -text
|
48 |
+
gallery_demo.png filter=lfs diff=lfs merge=lfs -text
|
49 |
+
animemory_alpha.safetensors filter=lfs diff=lfs merge=lfs -text
|
50 |
+
text_encoder/model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
|
51 |
+
text_encoder/model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,136 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Gallery
|
2 |
+
|
3 |
+
<img src="gallery_demo.png" width="2432" height="1440"/>
|
4 |
+
|
5 |
+
|
6 |
+
Animemory Alpha is a bilingual model primarily focused on anime-style image generation. It utilizes a SDXL-type Unet
|
7 |
+
structure and a self-developed bilingual T5-XXL text encoder, achieving good alignment between Chinese and English. We
|
8 |
+
first developed our general model using billion-level data and then tuned the anime model through a series of
|
9 |
+
post-training strategies and curated data. By open-sourcing the Alpha version, we hope to contribute to the development
|
10 |
+
of the anime community, and we greatly value any feedback.
|
11 |
+
|
12 |
+
# Key Features
|
13 |
+
|
14 |
+
- Good bilingual prompt following, effectively transforming certain Chinese concepts into anime style.
|
15 |
+
- The model is mainly にじげん(二次元) style, supporting common artistic styles and Chinese elements.
|
16 |
+
- Competitive image quality, especially in generating detailed characters and landscapes.
|
17 |
+
- Prediction mode is x-prediction, so the model tends to produce subjects with cleaner backgrounds; more detailed
|
18 |
+
prompts can further refine your images.
|
19 |
+
- Impressive creative ability, the more detailed the descriptions are, the more surprises it can produce.
|
20 |
+
- Embracing open-source co-construction; we welcome anime fans to join our ecosystem and share your creative ideas
|
21 |
+
through our workflow.
|
22 |
+
- Better support for Chinese-style elements.
|
23 |
+
- Compatible with both tag lists and natural language description-style prompts.
|
24 |
+
- Centered on a resolution of 1024, e.g. 896 * 1152 for vertical image output.
|
25 |
+
|
26 |
+
# Model Info
|
27 |
+
|
28 |
+
<table>
|
29 |
+
<tr>
|
30 |
+
<th>Developed by</th>
|
31 |
+
<td>animEEEmpire</td>
|
32 |
+
</tr>
|
33 |
+
<tr>
|
34 |
+
<th>Model Name</th>
|
35 |
+
<td>AniMemory-alpha</td>
|
36 |
+
</tr>
|
37 |
+
<tr>
|
38 |
+
<th>Model type</th>
|
39 |
+
<td>Diffusion-based text-to-image generative model</td>
|
40 |
+
</tr>
|
41 |
+
<tr>
|
42 |
+
<th>Download link</th>
|
43 |
+
<td><a href="https://huggingface.co/animEEEmpire/AniMemory-alpha">Hugging Face</a></td>
|
44 |
+
</tr>
|
45 |
+
<tr>
|
46 |
+
<th rowspan="4">Parameter</th>
|
47 |
+
<td>TextEncoder_1: 5.6B</td>
|
48 |
+
</tr>
|
49 |
+
<tr>
|
50 |
+
<td>TextEncoder_2: 950M</td>
|
51 |
+
</tr>
|
52 |
+
<tr>
|
53 |
+
<td>Unet: 3.1B</td>
|
54 |
+
</tr>
|
55 |
+
<tr>
|
56 |
+
<td>VAE: 271M</td>
|
57 |
+
</tr>
|
58 |
+
<tr>
|
59 |
+
<th>Context Length</th>
|
60 |
+
<td>227</td>
|
61 |
+
</tr>
|
62 |
+
<tr>
|
63 |
+
<th>Resolution</th>
|
64 |
+
<td>Multi-resolution</td>
|
65 |
+
</tr>
|
66 |
+
</table>
|
67 |
+
|
68 |
+
# Key Problems and Notes
|
69 |
+
|
70 |
+
- Primarily focuses on text-following ability and basic image quality; it is not a strongly artistic or stylized
|
71 |
+
version, making it suitable for open-source co-construction.
|
72 |
+
- Quantization and distillation are still in progress, leaving room for significant speed improvements and GPU memory
|
73 |
+
savings. We are planning for this and looking forward to volunteers.
|
74 |
+
- A relatively complete data filtering and cleaning process has been conducted, so it is not adept at pornographic
|
75 |
+
generation; any attempts to force it may result in image crashes.
|
76 |
+
- Simple descriptions tend to produce images with simple backgrounds and chibi-style illustrations; you can try to
|
77 |
+
enhance the detail by providing comprehensive descriptions.
|
78 |
+
- For close-up shots, please use descriptions like "detailed face", "close-up view" etc. to enhance the impact of the
|
79 |
+
output.
|
80 |
+
- Adding necessary quality descriptors can sometimes improve the overall quality.
|
81 |
+
- The issue with small faces still exists in the Alpha version, but it has been slightly improved; feel free to try it
|
82 |
+
out.
|
83 |
+
- It is better to detail a single object rather than too many objects in one prompt.
|
84 |
+
|
85 |
+
# Limitations
|
86 |
+
|
87 |
+
- Although the model data has undergone extensive cleaning, there may still be potential gender, ethnic, or political
|
88 |
+
biases.
|
89 |
+
- The model's open-sourcing is dedicated to enriching the ecosystem of the anime community and benefiting anime fans.
|
90 |
+
- The usage of the model shall not infringe upon the legal rights and interests of designers and creators.
|
91 |
+
|
92 |
+
# Quick Start
|
93 |
+
|
94 |
+
1.Install the necessary requirements.
|
95 |
+
|
96 |
+
- Recommended Python >= 3.10, PyTorch >= 2.3, CUDA >= 12.1.
|
97 |
+
|
98 |
+
- It is recommended to use Anaconda to create a new environment (Python >=
|
99 |
+
3.10) `conda create -n animemory python=3.10 -y` to run the following example.
|
100 |
+
|
101 |
+
- run `pip install git+https://github.com/huggingface/diffusers.git torch==2.3.1 transformers==4.43.0 accelerate==0.31.0 sentencepiece`
|
102 |
+
|
103 |
+
2.ComfyUI inference.
|
104 |
+
|
105 |
+
Go to [ComfyUI-Animemory-Loader](https://github.com/animEEEmpire/ComfyUI-Animemory-Loader) for comfyui configuration.
|
106 |
+
|
107 |
+
3.Diffusers inference.
|
108 |
+
|
109 |
+
```python
|
110 |
+
from diffusers import DiffusionPipeline
|
111 |
+
import torch
|
112 |
+
|
113 |
+
pipe = DiffusionPipeline.from_pretrained("animEEEmpire/AniMemory-alpha", trust_remote_code=True, torch_dtype=torch.bfloat16)
|
114 |
+
pipe.to("cuda")
|
115 |
+
|
116 |
+
prompt = "一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁"
|
117 |
+
negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"
|
118 |
+
|
119 |
+
images = pipe(prompt=prompt,
|
120 |
+
negative_prompt=negative_prompt,
|
121 |
+
num_inference_steps=40,
|
122 |
+
height=1024, width=1024,
|
123 |
+
guidance_scale=7,
|
124 |
+
)[0]
|
125 |
+
images.save("output.png")
|
126 |
+
```
|
127 |
+
|
128 |
+
- Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 14.25 G,
|
129 |
+
compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s.
|
130 |
+
17.74s on A100 40G).
|
131 |
+
|
132 |
+
4.For faster inference, please refer to our future work.
|
133 |
+
|
134 |
+
# License
|
135 |
+
|
136 |
+
This repo is released under the Apache 2.0 License.
|