Maitreyapatel commited on
Commit
e7ef52f
1 Parent(s): a0a181f

Karlo files

Browse files
README.md CHANGED
@@ -1,3 +1,88 @@
1
  ---
2
- license: creativeml-openrail-m
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: openrail++
3
+ language:
4
+ - en
5
+ library_name: diffusers
6
+ tags:
7
+ - text-to-image
8
+ - prior
9
+ - unclip
10
+ - kandinskyv2.2
11
  ---
12
+
13
+
14
+ # Introduction
15
+
16
+ This ECLIPSE model weight is a tiny (33M parameter) non-diffusion text-to-image prior model **trained on CC12M data**.
17
+
18
+ Despite being so small and trained on a limited amount of data, ECLIPSE priors achieve results that of 1 Billion parameter T2I prior models trained on millions of image-text pairs.
19
+
20
+ - **Project Page:** [https://eclipse-t2i.vercel.app](https://eclipse-t2i.vercel.app)
21
+ - **GitHub:** [https://github.com/eclipse-t2i/eclipse-inference](https://github.com/eclipse-t2i/eclipse-inference)
22
+
23
+
24
+ ## Evaluations
25
+
26
+ ![Qualitative Examples](./assets/example.png)
27
+
28
+ ![Results](./assets/results.png)
29
+
30
+ ## Installation
31
+ ```bash
32
+ git clone git@github.com:eclipse-t2i/eclipse-inference.git
33
+
34
+ conda create -p ./venv python=3.9
35
+ pip install -r requirements.txt
36
+ ```
37
+
38
+ ## Run Inference
39
+
40
+ This repository supports two pre-trained image decoders: [Karlo-v1-alpha](https://huggingface.co/kakaobrain/karlo-v1-alpha) and [Kandinsky-v2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder).
41
+ Note: ECLIPSE prior is not a diffusion model -- while image decoders are.
42
+
43
+ ### Karlo Inference
44
+ ```python
45
+ from src.pipelines.pipeline_unclip import UnCLIPPipeline
46
+ from src.priors.prior_transformer import PriorTransformer
47
+
48
+ prior = PriorTransformer.from_pretrained("ECLIPSE-Community/ECLIPSE_Karlo_Prior")
49
+ pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha", prior=prior).to("cuda")
50
+
51
+ prompt="black apples in the basket"
52
+ images = pipe(prompt, decoder_guidance_scale=7.5).images
53
+
54
+ images[0]
55
+ ```
56
+
57
+ ### Kandinsky Inference
58
+ ```python
59
+ from src.pipelines.pipeline_kandinsky_prior import KandinskyPriorPipeline
60
+ from src.priors.prior_transformer import PriorTransformer
61
+ from diffusers import DiffusionPipeline
62
+
63
+ prior = PriorTransformer.from_pretrained("ECLIPSE-Community/ECLIPSE_KandinskyV22_Prior")
64
+ pipe_prior = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", prior=prior).to("cuda")
65
+
66
+ pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder").to("cuda")
67
+
68
+ prompt = "black apples in the basket"
69
+ image_embeds, negative_image_embeds = pipe_prior(prompt).to_tuple()
70
+ images = pipe(
71
+ num_inference_steps=50,
72
+ image_embeds=image_embeds,
73
+ negative_image_embeds=negative_image_embeds,
74
+ ).images
75
+
76
+ images[0]
77
+ ```
78
+
79
+
80
+ ## Limitations
81
+
82
+ The model is intended for research purposes only to show a way to reduce the unnecessary resource usage in existing T2I research.
83
+
84
+ As this prior model is trained using very small LAION subset and CLIP supervision, it will observe the limitations from the CLIP model such as:
85
+ * Lack of spatial understanding.
86
+ * Cannot render legible text
87
+ * Complex compositionality is still a big challenge that can be improved if CLIP is improved.
88
+ * While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
assets/results.png ADDED
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PriorTransformer",
3
+ "_diffusers_version": "0.20.2",
4
+ "added_emb_type": "prd",
5
+ "additional_embeddings": 3,
6
+ "attention_head_dim": 32,
7
+ "clip_embed_dim": null,
8
+ "dropout": 0.0,
9
+ "embedding_dim": 768,
10
+ "embedding_proj_dim": null,
11
+ "embedding_proj_norm_type": null,
12
+ "encoder_hid_proj_type": "linear",
13
+ "norm_in_type": null,
14
+ "num_attention_heads": 16,
15
+ "num_embeddings": 77,
16
+ "num_layers": 10
17
+ }
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e3e9b1a6788613d1890313e0d1b54969a5b3deefc7d8d0f8a2886cadaec5dcd
3
+ size 132590432