animEEEmpire
/

AniMemory-alpha-dpo

Diffusers

Safetensors

Model card Files Files and versions Community

devancao commited on 12 days ago

Commit

3139399

•

1 Parent(s): c1e8039

track files

Browse files

Files changed (2) hide show

.gitattributes +16 -0
README.md +136 -3

.gitattributes CHANGED Viewed

@@ -33,3 +33,19 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer_2/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+text_encoder/weights.safetensors filter=lfs diff=lfs merge=lfs -text
+git-lfs filter=lfs diff=lfs merge=lfs -text
+text_encoder_2/weights.safetensors filter=lfs diff=lfs merge=lfs -text
+text_encoder_2/model.safetensors filter=lfs diff=lfs merge=lfs -text
+unet/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
+lfs filter=lfs diff=lfs merge=lfs -text
+tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+track filter=lfs diff=lfs merge=lfs -text
+text_encoder/model.safetensors filter=lfs diff=lfs merge=lfs -text
+vae/movq_model.safetensors filter=lfs diff=lfs merge=lfs -text
+git filter=lfs diff=lfs merge=lfs -text
+gallery_demo.png filter=lfs diff=lfs merge=lfs -text
+animemory_alpha.safetensors filter=lfs diff=lfs merge=lfs -text
+text_encoder/model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
+text_encoder/model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,136 @@
----
-license: apache-2.0
----

+# Gallery
+<img src="gallery_demo.png" width="2432" height="1440"/>
+Animemory Alpha is a bilingual model primarily focused on anime-style image generation. It utilizes a SDXL-type Unet
+structure and a self-developed bilingual T5-XXL text encoder, achieving good alignment between Chinese and English. We
+first developed our general model using billion-level data and then tuned the anime model through a series of
+post-training strategies and curated data. By open-sourcing the Alpha version, we hope to contribute to the development
+of the anime community, and we greatly value any feedback.
+# Key Features
+- Good bilingual prompt following, effectively transforming certain Chinese concepts into anime style.
+- The model is mainly にじげん(二次元) style, supporting common artistic styles and Chinese elements.
+- Competitive image quality, especially in generating detailed characters and landscapes.
+- Prediction mode is x-prediction, so the model tends to produce subjects with cleaner backgrounds; more detailed
+  prompts can further refine your images.
+- Impressive creative ability, the more detailed the descriptions are, the more surprises it can produce.
+- Embracing open-source co-construction; we welcome anime fans to join our ecosystem and share your creative ideas
+  through our workflow.
+- Better support for Chinese-style elements.
+- Compatible with both tag lists and natural language description-style prompts.
+- Centered on a resolution of 1024, e.g. 896 * 1152 for vertical image output.
+# Model Info
+<table>
+  <tr>
+    <th>Developed by</th>
+    <td>animEEEmpire</td>
+  </tr>
+  <tr>
+    <th>Model Name</th>
+    <td>AniMemory-alpha</td>
+  </tr>
+  <tr>
+    <th>Model type</th>
+    <td>Diffusion-based text-to-image generative model</td>
+  </tr>
+  <tr>
+    <th>Download link</th>
+    <td><a href="https://huggingface.co/animEEEmpire/AniMemory-alpha">Hugging Face</a></td>
+  </tr>
+  <tr>
+    <th rowspan="4">Parameter</th>
+    <td>TextEncoder_1: 5.6B</td>
+  </tr>
+  <tr>
+    <td>TextEncoder_2: 950M</td>
+  </tr>
+  <tr>
+    <td>Unet: 3.1B</td>
+  </tr>
+  <tr>
+    <td>VAE: 271M</td>
+  </tr>
+  <tr>
+    <th>Context Length</th>
+    <td>227</td>
+  </tr>
+  <tr>
+    <th>Resolution</th>
+    <td>Multi-resolution</td>
+  </tr>
+</table>
+# Key Problems and Notes
+- Primarily focuses on text-following ability and basic image quality; it is not a strongly artistic or stylized
+  version, making it suitable for open-source co-construction.
+- Quantization and distillation are still in progress, leaving room for significant speed improvements and GPU memory
+  savings. We are planning for this and looking forward to volunteers.
+- A relatively complete data filtering and cleaning process has been conducted, so it is not adept at pornographic
+  generation; any attempts to force it may result in image crashes.
+- Simple descriptions tend to produce images with simple backgrounds and chibi-style illustrations; you can try to
+  enhance the detail by providing comprehensive descriptions.
+- For close-up shots, please use descriptions like "detailed face", "close-up view" etc. to enhance the impact of the
+  output.
+- Adding necessary quality descriptors can sometimes improve the overall quality.
+- The issue with small faces still exists in the Alpha version, but it has been slightly improved; feel free to try it
+  out.
+- It is better to detail a single object rather than too many objects in one prompt.
+# Limitations
+- Although the model data has undergone extensive cleaning, there may still be potential gender, ethnic, or political
+  biases.
+- The model's open-sourcing is dedicated to enriching the ecosystem of the anime community and benefiting anime fans.
+- The usage of the model shall not infringe upon the legal rights and interests of designers and creators.
+# Quick Start
+1.Install the necessary requirements.
+- Recommended Python >= 3.10, PyTorch >= 2.3, CUDA >= 12.1.
+- It is recommended to use Anaconda to create a new environment (Python >=
+  3.10) `conda create -n animemory python=3.10 -y` to run the following example.
+- run `pip install git+https://github.com/huggingface/diffusers.git torch==2.3.1 transformers==4.43.0 accelerate==0.31.0 sentencepiece`
+2.ComfyUI inference.
+Go to [ComfyUI-Animemory-Loader](https://github.com/animEEEmpire/ComfyUI-Animemory-Loader) for comfyui configuration.
+3.Diffusers inference.
+```python
+from diffusers import DiffusionPipeline
+import torch
+pipe = DiffusionPipeline.from_pretrained("animEEEmpire/AniMemory-alpha", trust_remote_code=True, torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+prompt = "一只凶恶的狼，猩红的眼神，在午夜咆哮，月光皎洁"
+negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"
+images = pipe(prompt=prompt,
+              negative_prompt=negative_prompt,
+              num_inference_steps=40,
+              height=1024, width=1024,
+              guidance_scale=7,
+              )[0]
+images.save("output.png")
+```
+- Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 14.25 G,
+compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s.
+17.74s on A100 40G).
+4.For faster inference, please refer to our future work.
+# License
+This repo is released under the Apache 2.0 License.