devancao commited on
Commit
3139399
1 Parent(s): c1e8039

track files

Browse files
Files changed (2) hide show
  1. .gitattributes +16 -0
  2. README.md +136 -3
.gitattributes CHANGED
@@ -33,3 +33,19 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer_2/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ text_encoder/weights.safetensors filter=lfs diff=lfs merge=lfs -text
38
+ git-lfs filter=lfs diff=lfs merge=lfs -text
39
+ text_encoder_2/weights.safetensors filter=lfs diff=lfs merge=lfs -text
40
+ text_encoder_2/model.safetensors filter=lfs diff=lfs merge=lfs -text
41
+ unet/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
42
+ lfs filter=lfs diff=lfs merge=lfs -text
43
+ tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
44
+ track filter=lfs diff=lfs merge=lfs -text
45
+ text_encoder/model.safetensors filter=lfs diff=lfs merge=lfs -text
46
+ vae/movq_model.safetensors filter=lfs diff=lfs merge=lfs -text
47
+ git filter=lfs diff=lfs merge=lfs -text
48
+ gallery_demo.png filter=lfs diff=lfs merge=lfs -text
49
+ animemory_alpha.safetensors filter=lfs diff=lfs merge=lfs -text
50
+ text_encoder/model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
51
+ text_encoder/model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,136 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gallery
2
+
3
+ <img src="gallery_demo.png" width="2432" height="1440"/>
4
+
5
+
6
+ Animemory Alpha is a bilingual model primarily focused on anime-style image generation. It utilizes a SDXL-type Unet
7
+ structure and a self-developed bilingual T5-XXL text encoder, achieving good alignment between Chinese and English. We
8
+ first developed our general model using billion-level data and then tuned the anime model through a series of
9
+ post-training strategies and curated data. By open-sourcing the Alpha version, we hope to contribute to the development
10
+ of the anime community, and we greatly value any feedback.
11
+
12
+ # Key Features
13
+
14
+ - Good bilingual prompt following, effectively transforming certain Chinese concepts into anime style.
15
+ - The model is mainly にじげん(二次元) style, supporting common artistic styles and Chinese elements.
16
+ - Competitive image quality, especially in generating detailed characters and landscapes.
17
+ - Prediction mode is x-prediction, so the model tends to produce subjects with cleaner backgrounds; more detailed
18
+ prompts can further refine your images.
19
+ - Impressive creative ability, the more detailed the descriptions are, the more surprises it can produce.
20
+ - Embracing open-source co-construction; we welcome anime fans to join our ecosystem and share your creative ideas
21
+ through our workflow.
22
+ - Better support for Chinese-style elements.
23
+ - Compatible with both tag lists and natural language description-style prompts.
24
+ - Centered on a resolution of 1024, e.g. 896 * 1152 for vertical image output.
25
+
26
+ # Model Info
27
+
28
+ <table>
29
+ <tr>
30
+ <th>Developed by</th>
31
+ <td>animEEEmpire</td>
32
+ </tr>
33
+ <tr>
34
+ <th>Model Name</th>
35
+ <td>AniMemory-alpha</td>
36
+ </tr>
37
+ <tr>
38
+ <th>Model type</th>
39
+ <td>Diffusion-based text-to-image generative model</td>
40
+ </tr>
41
+ <tr>
42
+ <th>Download link</th>
43
+ <td><a href="https://huggingface.co/animEEEmpire/AniMemory-alpha">Hugging Face</a></td>
44
+ </tr>
45
+ <tr>
46
+ <th rowspan="4">Parameter</th>
47
+ <td>TextEncoder_1: 5.6B</td>
48
+ </tr>
49
+ <tr>
50
+ <td>TextEncoder_2: 950M</td>
51
+ </tr>
52
+ <tr>
53
+ <td>Unet: 3.1B</td>
54
+ </tr>
55
+ <tr>
56
+ <td>VAE: 271M</td>
57
+ </tr>
58
+ <tr>
59
+ <th>Context Length</th>
60
+ <td>227</td>
61
+ </tr>
62
+ <tr>
63
+ <th>Resolution</th>
64
+ <td>Multi-resolution</td>
65
+ </tr>
66
+ </table>
67
+
68
+ # Key Problems and Notes
69
+
70
+ - Primarily focuses on text-following ability and basic image quality; it is not a strongly artistic or stylized
71
+ version, making it suitable for open-source co-construction.
72
+ - Quantization and distillation are still in progress, leaving room for significant speed improvements and GPU memory
73
+ savings. We are planning for this and looking forward to volunteers.
74
+ - A relatively complete data filtering and cleaning process has been conducted, so it is not adept at pornographic
75
+ generation; any attempts to force it may result in image crashes.
76
+ - Simple descriptions tend to produce images with simple backgrounds and chibi-style illustrations; you can try to
77
+ enhance the detail by providing comprehensive descriptions.
78
+ - For close-up shots, please use descriptions like "detailed face", "close-up view" etc. to enhance the impact of the
79
+ output.
80
+ - Adding necessary quality descriptors can sometimes improve the overall quality.
81
+ - The issue with small faces still exists in the Alpha version, but it has been slightly improved; feel free to try it
82
+ out.
83
+ - It is better to detail a single object rather than too many objects in one prompt.
84
+
85
+ # Limitations
86
+
87
+ - Although the model data has undergone extensive cleaning, there may still be potential gender, ethnic, or political
88
+ biases.
89
+ - The model's open-sourcing is dedicated to enriching the ecosystem of the anime community and benefiting anime fans.
90
+ - The usage of the model shall not infringe upon the legal rights and interests of designers and creators.
91
+
92
+ # Quick Start
93
+
94
+ 1.Install the necessary requirements.
95
+
96
+ - Recommended Python >= 3.10, PyTorch >= 2.3, CUDA >= 12.1.
97
+
98
+ - It is recommended to use Anaconda to create a new environment (Python >=
99
+ 3.10) `conda create -n animemory python=3.10 -y` to run the following example.
100
+
101
+ - run `pip install git+https://github.com/huggingface/diffusers.git torch==2.3.1 transformers==4.43.0 accelerate==0.31.0 sentencepiece`
102
+
103
+ 2.ComfyUI inference.
104
+
105
+ Go to [ComfyUI-Animemory-Loader](https://github.com/animEEEmpire/ComfyUI-Animemory-Loader) for comfyui configuration.
106
+
107
+ 3.Diffusers inference.
108
+
109
+ ```python
110
+ from diffusers import DiffusionPipeline
111
+ import torch
112
+
113
+ pipe = DiffusionPipeline.from_pretrained("animEEEmpire/AniMemory-alpha", trust_remote_code=True, torch_dtype=torch.bfloat16)
114
+ pipe.to("cuda")
115
+
116
+ prompt = "一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁"
117
+ negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"
118
+
119
+ images = pipe(prompt=prompt,
120
+ negative_prompt=negative_prompt,
121
+ num_inference_steps=40,
122
+ height=1024, width=1024,
123
+ guidance_scale=7,
124
+ )[0]
125
+ images.save("output.png")
126
+ ```
127
+
128
+ - Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 14.25 G,
129
+ compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s.
130
+ 17.74s on A100 40G).
131
+
132
+ 4.For faster inference, please refer to our future work.
133
+
134
+ # License
135
+
136
+ This repo is released under the Apache 2.0 License.